diff mbox series

ata: sata_mv: Fix the return value of the probe function

Message ID 1634795836-1803-1-git-send-email-zheyuma97@gmail.com
State New
Headers show
Series ata: sata_mv: Fix the return value of the probe function | expand

Commit Message

Zheyu Ma Oct. 21, 2021, 5:57 a.m. UTC
mv_init_host() propagates the value returned by mv_chip_id() which in turn
gets propagated by mv_pci_init_one() and hits local_pci_probe().

During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
---
 drivers/ata/sata_mv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Sergey Shtylyov Oct. 21, 2021, 8:37 a.m. UTC | #1
On 21.10.2021 8:57, Zheyu Ma wrote:

> mv_init_host() propagates the value returned by mv_chip_id() which in turn
> gets propagated by mv_pci_init_one() and hits local_pci_probe().
> 
> During the process of driver probing, the probe function should return < 0
> for failure, otherwise, the kernel will treat value > 0 as success.
> 
> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
> ---
>   drivers/ata/sata_mv.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
> index 9d86203e1e7a..7461fe078dd1 100644
> --- a/drivers/ata/sata_mv.c
> +++ b/drivers/ata/sata_mv.c
> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
>   
>   	default:
>   		dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
> -		return 1;
> +		return -ENODEV;

    Doesn't -EINVAL fit better here?

[...]

MBR, Sergey
Damien Le Moal Oct. 21, 2021, 10:38 a.m. UTC | #2
On 2021/10/21 17:37, Sergey Shtylyov wrote:
> On 21.10.2021 8:57, Zheyu Ma wrote:
> 
>> mv_init_host() propagates the value returned by mv_chip_id() which in turn
>> gets propagated by mv_pci_init_one() and hits local_pci_probe().
>>
>> During the process of driver probing, the probe function should return < 0
>> for failure, otherwise, the kernel will treat value > 0 as success.
>>
>> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
>> ---
>>   drivers/ata/sata_mv.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
>> index 9d86203e1e7a..7461fe078dd1 100644
>> --- a/drivers/ata/sata_mv.c
>> +++ b/drivers/ata/sata_mv.c
>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
>>   
>>   	default:
>>   		dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
>> -		return 1;
>> +		return -ENODEV;
> 
>     Doesn't -EINVAL fit better here?

If the error message is correct and this can only happen if there is a bug
somewhere, I do not think the error code really matters much. The dev_err()
should probably be changed to dev_alert() or even dev_crit() for this case.

> 
> [...]
> 
> MBR, Sergey
>
Zheyu Ma Oct. 21, 2021, 11:23 a.m. UTC | #3
On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal
<damien.lemoal@opensource.wdc.com> wrote:
>
> On 2021/10/21 17:37, Sergey Shtylyov wrote:
> > On 21.10.2021 8:57, Zheyu Ma wrote:
> >
> >> mv_init_host() propagates the value returned by mv_chip_id() which in turn
> >> gets propagated by mv_pci_init_one() and hits local_pci_probe().
> >>
> >> During the process of driver probing, the probe function should return < 0
> >> for failure, otherwise, the kernel will treat value > 0 as success.
> >>
> >> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
> >> ---
> >>   drivers/ata/sata_mv.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
> >> index 9d86203e1e7a..7461fe078dd1 100644
> >> --- a/drivers/ata/sata_mv.c
> >> +++ b/drivers/ata/sata_mv.c
> >> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
> >>
> >>      default:
> >>              dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
> >> -            return 1;
> >> +            return -ENODEV;
> >
> >     Doesn't -EINVAL fit better here?
>
> If the error message is correct and this can only happen if there is a bug
> somewhere, I do not think the error code really matters much. The dev_err()
> should probably be changed to dev_alert() or even dev_crit() for this case.
>

I don't think so, the error code does matter. If mv_chip_id() returns
1 which eventually causes the probe function to return 1, then the
kernel will assume that the driver and the hardware match successfully
(even if that is not the case), which will cause the following error
if modprobe is called to remove the driver.

[   21.944486] general protection fault, probably for non-canonical
address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI
[   21.945317] KASAN: null-ptr-deref in range
[0x00000000000000d8-0x00000000000000df]
[   21.954442] Call Trace:
[   21.954624]  ? scsi_remove_host+0x32/0x660
[   21.954923]  ? lockdep_hardirqs_on+0x7e/0x110
[   21.955240]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[   21.955634]  ? mutex_lock_io_nested+0x60/0x60
[   21.956027]  ? _raw_spin_unlock_irqrestore+0x41/0x60
[   21.956395]  ? async_synchronize_cookie_domain+0x35f/0x4a0
[   21.956802]  ? async_synchronize_full_domain+0x20/0x20
[   21.957179]  ? lock_release+0x63f/0x8f0
[   21.957468]  mutex_lock_nested+0x1b/0x30
[   21.957761]  scsi_remove_host+0x32/0x660
[   21.958054]  ata_host_detach+0x75d/0x830
[   21.958349]  ata_pci_remove_one+0x3b/0x40
[   21.958649]  pci_device_remove+0xa9/0x250
[   21.958949]  ? pci_device_probe+0x7d0/0x7d0
[   21.959261]  device_release_driver_internal+0x4f7/0x7a0
[   21.959647]  driver_detach+0x1e8/0x2c0
[   21.959929]  bus_remove_driver+0x134/0x290
[   21.960234]  ? sysfs_remove_groups+0x97/0xb0
[   21.960552]  driver_unregister+0x77/0xa0
[   21.960859]  pci_unregister_driver+0x2c/0x1c0
[   21.961178]  cleanup_module+0x15/0x28 [sata_mv]

This is not the case if the correct error code is returned.

> >
> > [...]
> >
> > MBR, Sergey
> >
>
>
> --
> Damien Le Moal
> Western Digital Research

Regards,
Zheyu Ma
Damien Le Moal Oct. 22, 2021, 1:40 a.m. UTC | #4
On 2021/10/21 20:23, Zheyu Ma wrote:
> On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal
> <damien.lemoal@opensource.wdc.com> wrote:
>>
>> On 2021/10/21 17:37, Sergey Shtylyov wrote:
>>> On 21.10.2021 8:57, Zheyu Ma wrote:
>>>
>>>> mv_init_host() propagates the value returned by mv_chip_id() which in turn
>>>> gets propagated by mv_pci_init_one() and hits local_pci_probe().
>>>>
>>>> During the process of driver probing, the probe function should return < 0
>>>> for failure, otherwise, the kernel will treat value > 0 as success.
>>>>
>>>> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
>>>> ---
>>>>   drivers/ata/sata_mv.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
>>>> index 9d86203e1e7a..7461fe078dd1 100644
>>>> --- a/drivers/ata/sata_mv.c
>>>> +++ b/drivers/ata/sata_mv.c
>>>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
>>>>
>>>>      default:
>>>>              dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
>>>> -            return 1;
>>>> +            return -ENODEV;
>>>
>>>     Doesn't -EINVAL fit better here?
>>
>> If the error message is correct and this can only happen if there is a bug
>> somewhere, I do not think the error code really matters much. The dev_err()
>> should probably be changed to dev_alert() or even dev_crit() for this case.
>>
> 
> I don't think so, the error code does matter. If mv_chip_id() returns
> 1 which eventually causes the probe function to return 1, then the
> kernel will assume that the driver and the hardware match successfully
> (even if that is not the case), which will cause the following error
> if modprobe is called to remove the driver.

What I meant is that -EINVAL or -ENODEV or any other proper error code do not
really matter much since this error seems to happen only if there is something
really wrong with the setup. "return 1" as it was definitely seems wrong.

Given that the problem triggers with an invalid board_idx, -EINVAL seems more
appropriate. But I would also change that message to dev_crit() or dev_alert()
since this is a bug rather than a recoverable runtime error.

Can you resend the patch with these changes ?

> 
> [   21.944486] general protection fault, probably for non-canonical
> address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI
> [   21.945317] KASAN: null-ptr-deref in range
> [0x00000000000000d8-0x00000000000000df]
> [   21.954442] Call Trace:
> [   21.954624]  ? scsi_remove_host+0x32/0x660
> [   21.954923]  ? lockdep_hardirqs_on+0x7e/0x110
> [   21.955240]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> [   21.955634]  ? mutex_lock_io_nested+0x60/0x60
> [   21.956027]  ? _raw_spin_unlock_irqrestore+0x41/0x60
> [   21.956395]  ? async_synchronize_cookie_domain+0x35f/0x4a0
> [   21.956802]  ? async_synchronize_full_domain+0x20/0x20
> [   21.957179]  ? lock_release+0x63f/0x8f0
> [   21.957468]  mutex_lock_nested+0x1b/0x30
> [   21.957761]  scsi_remove_host+0x32/0x660
> [   21.958054]  ata_host_detach+0x75d/0x830
> [   21.958349]  ata_pci_remove_one+0x3b/0x40
> [   21.958649]  pci_device_remove+0xa9/0x250
> [   21.958949]  ? pci_device_probe+0x7d0/0x7d0
> [   21.959261]  device_release_driver_internal+0x4f7/0x7a0
> [   21.959647]  driver_detach+0x1e8/0x2c0
> [   21.959929]  bus_remove_driver+0x134/0x290
> [   21.960234]  ? sysfs_remove_groups+0x97/0xb0
> [   21.960552]  driver_unregister+0x77/0xa0
> [   21.960859]  pci_unregister_driver+0x2c/0x1c0
> [   21.961178]  cleanup_module+0x15/0x28 [sata_mv]
> 
> This is not the case if the correct error code is returned.
> 
>>>
>>> [...]
>>>
>>> MBR, Sergey
>>>
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research
> 
> Regards,
> Zheyu Ma
>
Damien Le Moal Oct. 22, 2021, 1:41 a.m. UTC | #5
On 2021/10/21 20:23, Zheyu Ma wrote:
> On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal
> <damien.lemoal@opensource.wdc.com> wrote:
>>
>> On 2021/10/21 17:37, Sergey Shtylyov wrote:
>>> On 21.10.2021 8:57, Zheyu Ma wrote:
>>>
>>>> mv_init_host() propagates the value returned by mv_chip_id() which in turn
>>>> gets propagated by mv_pci_init_one() and hits local_pci_probe().
>>>>
>>>> During the process of driver probing, the probe function should return < 0
>>>> for failure, otherwise, the kernel will treat value > 0 as success.
>>>>
>>>> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
>>>> ---
>>>>   drivers/ata/sata_mv.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
>>>> index 9d86203e1e7a..7461fe078dd1 100644
>>>> --- a/drivers/ata/sata_mv.c
>>>> +++ b/drivers/ata/sata_mv.c
>>>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
>>>>
>>>>      default:
>>>>              dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
>>>> -            return 1;
>>>> +            return -ENODEV;
>>>
>>>     Doesn't -EINVAL fit better here?
>>
>> If the error message is correct and this can only happen if there is a bug
>> somewhere, I do not think the error code really matters much. The dev_err()
>> should probably be changed to dev_alert() or even dev_crit() for this case.
>>
> 
> I don't think so, the error code does matter. If mv_chip_id() returns
> 1 which eventually causes the probe function to return 1, then the
> kernel will assume that the driver and the hardware match successfully
> (even if that is not the case), which will cause the following error
> if modprobe is called to remove the driver.
> 
> [   21.944486] general protection fault, probably for non-canonical
> address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI
> [   21.945317] KASAN: null-ptr-deref in range
> [0x00000000000000d8-0x00000000000000df]
> [   21.954442] Call Trace:
> [   21.954624]  ? scsi_remove_host+0x32/0x660
> [   21.954923]  ? lockdep_hardirqs_on+0x7e/0x110
> [   21.955240]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> [   21.955634]  ? mutex_lock_io_nested+0x60/0x60
> [   21.956027]  ? _raw_spin_unlock_irqrestore+0x41/0x60
> [   21.956395]  ? async_synchronize_cookie_domain+0x35f/0x4a0
> [   21.956802]  ? async_synchronize_full_domain+0x20/0x20
> [   21.957179]  ? lock_release+0x63f/0x8f0
> [   21.957468]  mutex_lock_nested+0x1b/0x30
> [   21.957761]  scsi_remove_host+0x32/0x660
> [   21.958054]  ata_host_detach+0x75d/0x830
> [   21.958349]  ata_pci_remove_one+0x3b/0x40
> [   21.958649]  pci_device_remove+0xa9/0x250
> [   21.958949]  ? pci_device_probe+0x7d0/0x7d0
> [   21.959261]  device_release_driver_internal+0x4f7/0x7a0
> [   21.959647]  driver_detach+0x1e8/0x2c0
> [   21.959929]  bus_remove_driver+0x134/0x290
> [   21.960234]  ? sysfs_remove_groups+0x97/0xb0
> [   21.960552]  driver_unregister+0x77/0xa0
> [   21.960859]  pci_unregister_driver+0x2c/0x1c0
> [   21.961178]  cleanup_module+0x15/0x28 [sata_mv]

How do you trigger this ? A bad device tree or something like that ?

> 
> This is not the case if the correct error code is returned.
> 
>>>
>>> [...]
>>>
>>> MBR, Sergey
>>>
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research
> 
> Regards,
> Zheyu Ma
>
Zheyu Ma Oct. 22, 2021, 9:18 a.m. UTC | #6
On Fri, Oct 22, 2021 at 9:41 AM Damien Le Moal
<damien.lemoal@opensource.wdc.com> wrote:
>
> On 2021/10/21 20:23, Zheyu Ma wrote:
> > On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal
> > <damien.lemoal@opensource.wdc.com> wrote:
> >>
> >> On 2021/10/21 17:37, Sergey Shtylyov wrote:
> >>> On 21.10.2021 8:57, Zheyu Ma wrote:
> >>>
> >>>> mv_init_host() propagates the value returned by mv_chip_id() which in turn
> >>>> gets propagated by mv_pci_init_one() and hits local_pci_probe().
> >>>>
> >>>> During the process of driver probing, the probe function should return < 0
> >>>> for failure, otherwise, the kernel will treat value > 0 as success.
> >>>>
> >>>> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
> >>>> ---
> >>>>   drivers/ata/sata_mv.c | 2 +-
> >>>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
> >>>> index 9d86203e1e7a..7461fe078dd1 100644
> >>>> --- a/drivers/ata/sata_mv.c
> >>>> +++ b/drivers/ata/sata_mv.c
> >>>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
> >>>>
> >>>>      default:
> >>>>              dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
> >>>> -            return 1;
> >>>> +            return -ENODEV;
> >>>
> >>>     Doesn't -EINVAL fit better here?
> >>
> >> If the error message is correct and this can only happen if there is a bug
> >> somewhere, I do not think the error code really matters much. The dev_err()
> >> should probably be changed to dev_alert() or even dev_crit() for this case.
> >>
> >
> > I don't think so, the error code does matter. If mv_chip_id() returns
> > 1 which eventually causes the probe function to return 1, then the
> > kernel will assume that the driver and the hardware match successfully
> > (even if that is not the case), which will cause the following error
> > if modprobe is called to remove the driver.
> >
> > [   21.944486] general protection fault, probably for non-canonical
> > address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI
> > [   21.945317] KASAN: null-ptr-deref in range
> > [0x00000000000000d8-0x00000000000000df]
> > [   21.954442] Call Trace:
> > [   21.954624]  ? scsi_remove_host+0x32/0x660
> > [   21.954923]  ? lockdep_hardirqs_on+0x7e/0x110
> > [   21.955240]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> > [   21.955634]  ? mutex_lock_io_nested+0x60/0x60
> > [   21.956027]  ? _raw_spin_unlock_irqrestore+0x41/0x60
> > [   21.956395]  ? async_synchronize_cookie_domain+0x35f/0x4a0
> > [   21.956802]  ? async_synchronize_full_domain+0x20/0x20
> > [   21.957179]  ? lock_release+0x63f/0x8f0
> > [   21.957468]  mutex_lock_nested+0x1b/0x30
> > [   21.957761]  scsi_remove_host+0x32/0x660
> > [   21.958054]  ata_host_detach+0x75d/0x830
> > [   21.958349]  ata_pci_remove_one+0x3b/0x40
> > [   21.958649]  pci_device_remove+0xa9/0x250
> > [   21.958949]  ? pci_device_probe+0x7d0/0x7d0
> > [   21.959261]  device_release_driver_internal+0x4f7/0x7a0
> > [   21.959647]  driver_detach+0x1e8/0x2c0
> > [   21.959929]  bus_remove_driver+0x134/0x290
> > [   21.960234]  ? sysfs_remove_groups+0x97/0xb0
> > [   21.960552]  driver_unregister+0x77/0xa0
> > [   21.960859]  pci_unregister_driver+0x2c/0x1c0
> > [   21.961178]  cleanup_module+0x15/0x28 [sata_mv]
>
> How do you trigger this ? A bad device tree or something like that ?

Pretty much, I was testing on qemu and used fault injection to force
the my_chip_id() to fail, even though this rarely happens.

Regards,
Zheyu Ma
diff mbox series

Patch

diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
index 9d86203e1e7a..7461fe078dd1 100644
--- a/drivers/ata/sata_mv.c
+++ b/drivers/ata/sata_mv.c
@@ -3897,7 +3897,7 @@  static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
 
 	default:
 		dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
-		return 1;
+		return -ENODEV;
 	}
 
 	hpriv->hp_flags = hp_flags;