diff mbox series

[v1,1/2] ata: sata_dwc_460ex: Fix crash due to OOB write

Message ID ba068938e629eb1a8b423a54405233e685cedb78.1647594132.git.chunkeey@gmail.com
State Superseded, archived
Headers show
Series [v1,1/2] ata: sata_dwc_460ex: Fix crash due to OOB write | expand

Commit Message

Christian Lamparter March 18, 2022, 9:03 a.m. UTC
the driver uses libata's "tag" values from in various arrays.
Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.

Otherwise ATA_TAG_INTERNAL cause a crash like this:

| BUG: Kernel NULL pointer dereference at 0x00000000
| Faulting instruction address: 0xc03ed4b8
| Oops: Kernel access of bad area, sig: 11 [#1]
| BE PAGE_SIZE=4K PowerPC 44x Platform
| CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
| NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
| REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
| MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
| DEAR: 00000000 ESR: 00000000
| GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
| [..]
| NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
| LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
| Call Trace:
| [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
| [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
| [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
| [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
| [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
| [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
| [...]

this is because sata_dwc_dma_xfer_complete() NULLs the
dma_pending's next neighbour "chan" (a *dma_chan struct) in
this '32' case right here (line ~735):
> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;

Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
causes the crash.

Reported-by: ticerex (OpenWrt Forum)
Fixes: 28361c403683c ("libata: add extra internal command")
Cc: stable@kernel.org # 4.18+
Link: https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
---
ticerex said when I've asked him about his real name+email for the patches:
"Please use my forum nick."
<https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
(I know checkpatch.pl complains about that. But what can you do?)
---
 drivers/ata/sata_dwc_460ex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Damien Le Moal March 18, 2022, 9:34 a.m. UTC | #1
On 3/18/22 18:03, Christian Lamparter wrote:
> the driver uses libata's "tag" values from in various arrays.
> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.
> 
> Otherwise ATA_TAG_INTERNAL cause a crash like this:
> 
> | BUG: Kernel NULL pointer dereference at 0x00000000
> | Faulting instruction address: 0xc03ed4b8
> | Oops: Kernel access of bad area, sig: 11 [#1]
> | BE PAGE_SIZE=4K PowerPC 44x Platform
> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
> | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
> | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
> | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
> | DEAR: 00000000 ESR: 00000000
> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
> | [..]
> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
> | Call Trace:
> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
> | [...]
> 
> this is because sata_dwc_dma_xfer_complete() NULLs the
> dma_pending's next neighbour "chan" (a *dma_chan struct) in
> this '32' case right here (line ~735):
>> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
> 
> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
> causes the crash.
> 
> Reported-by: ticerex (OpenWrt Forum)

I would remove this since you have the link to the bug report below.
Without a real name & email, this does not make much sense.

> Fixes: 28361c403683c ("libata: add extra internal command")
> Cc: stable@kernel.org # 4.18+
> Link: https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464
> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
> ---
> ticerex said when I've asked him about his real name+email for the patches:
> "Please use my forum nick."
> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
> (I know checkpatch.pl complains about that. But what can you do?)
> ---
>  drivers/ata/sata_dwc_460ex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c
> index bec33d781ae0..061b27584667 100644
> --- a/drivers/ata/sata_dwc_460ex.c
> +++ b/drivers/ata/sata_dwc_460ex.c
> @@ -137,7 +137,7 @@ struct sata_dwc_device {
>  #endif
>  };
>  
> -#define SATA_DWC_QCMD_MAX	32
> +#define SATA_DWC_QCMD_MAX	33

This is really ugly, but I do not see a better way to do it simply.
But at least, let's do something like this to avoid confusion and to show
that this driver is not doing some black magic with ATA drives:

/*
 * Allow one extra special slot for commands and DMA management to
 * account for libata internal commands.
 */
#define SATA_DWC_QCMD_MAX	(ATA_MAX_QUEUE + 1)

Thoughts ?

>  
>  struct sata_dwc_device_port {
>  	struct sata_dwc_device	*hsdev;
Christian Lamparter March 18, 2022, 11:43 a.m. UTC | #2
On 18/03/2022 10:34, Damien Le Moal wrote:
> On 3/18/22 18:03, Christian Lamparter wrote:
>> the driver uses libata's "tag" values from in various arrays.
>> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
>> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.
>>
>> Otherwise ATA_TAG_INTERNAL cause a crash like this:
>>
>> | BUG: Kernel NULL pointer dereference at 0x00000000
>> | Faulting instruction address: 0xc03ed4b8
>> | Oops: Kernel access of bad area, sig: 11 [#1]
>> | BE PAGE_SIZE=4K PowerPC 44x Platform
>> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
>> | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
>> | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
>> | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
>> | DEAR: 00000000 ESR: 00000000
>> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
>> | [..]
>> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
>> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>> | Call Trace:
>> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
>> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
>> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
>> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
>> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
>> | [...]
>>
>> this is because sata_dwc_dma_xfer_complete() NULLs the
>> dma_pending's next neighbour "chan" (a *dma_chan struct) in
>> this '32' case right here (line ~735):
>>> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
>>
>> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
>> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
>> causes the crash.
>>
>> Reported-by: ticerex (OpenWrt Forum)
> 
> I would remove this since you have the link to the bug report below.
> Without a real name & email, this does not make much sense.
Ok.

> 
>> Fixes: 28361c403683c ("libata: add extra internal command")
>> Cc: stable@kernel.org # 4.18+
>> Link: https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464
>> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
>> ---
>> ticerex said when I've asked him about his real name+email for the patches:
>> "Please use my forum nick."
>> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
>> (I know checkpatch.pl complains about that. But what can you do?)
>> ---
>>   drivers/ata/sata_dwc_460ex.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c
>> index bec33d781ae0..061b27584667 100644
>> --- a/drivers/ata/sata_dwc_460ex.c
>> +++ b/drivers/ata/sata_dwc_460ex.c
>> @@ -137,7 +137,7 @@ struct sata_dwc_device {
>>   #endif
>>   };
>>   
>> -#define SATA_DWC_QCMD_MAX	32
>> +#define SATA_DWC_QCMD_MAX	33
> 
> This is really ugly, but I do not see a better way to do it simply.
> But at least, let's do something like this to avoid confusion and to show
> that this driver is not doing some black magic with ATA drives:
> 
> /*
>   * Allow one extra special slot for commands and DMA management to
>   * account for libata internal commands.
>   */
> #define SATA_DWC_QCMD_MAX	(ATA_MAX_QUEUE + 1)
> 
> Thoughts ?

Yes, this works. That ATA_TAG_INTERNAL value has remained unchanged
since Jens' change from 2018. How do you want to proceed?

Should I make a v2, or do you update the patch locally?

The way I understand it. this ATA_TAG_INTERNAL has the special MAGIC
value so DMA "qc issues" do not interfere with possibly concurrent NCQ
"qc issues" on the involved controllers.

sata_dwc_460ex's NCQ is disabled/gimped by line 1093:
| /* .can_queue           = ATA_MAX_QUEUE, */)

Reassiging the ATA_TAG_INTERNAL could have been done.
But just increasing the arrays worked :-).

Cheers,
Christian
Andy Shevchenko March 18, 2022, 2:16 p.m. UTC | #3
On Fri, Mar 18, 2022 at 10:03:11AM +0100, Christian Lamparter wrote:
> the driver uses libata's "tag" values from in various arrays.
> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.
> 
> Otherwise ATA_TAG_INTERNAL cause a crash like this:
> 
> | BUG: Kernel NULL pointer dereference at 0x00000000
> | Faulting instruction address: 0xc03ed4b8
> | Oops: Kernel access of bad area, sig: 11 [#1]
> | BE PAGE_SIZE=4K PowerPC 44x Platform
> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
> | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
> | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
> | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
> | DEAR: 00000000 ESR: 00000000
> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
> | [..]
> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
> | Call Trace:
> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
> | [...]
> 
> this is because sata_dwc_dma_xfer_complete() NULLs the
> dma_pending's next neighbour "chan" (a *dma_chan struct) in
> this '32' case right here (line ~735):
> > hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
> 
> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
> causes the crash.

...

> ticerex said when I've asked him about his real name+email for the patches:
> "Please use my forum nick."
> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
> (I know checkpatch.pl complains about that. But what can you do?)

I think Reported-by: is fine to have any kind of reference to the reporter.
I can consider it false positive.

...

> -#define SATA_DWC_QCMD_MAX	32
> +#define SATA_DWC_QCMD_MAX	33

Can't we use 

#define SATA_DWC_QCMD_MAX	(ATA_TAG_INTERNAL + 1)

instead?
Christian Lamparter March 18, 2022, 6:06 p.m. UTC | #4
On 18/03/2022 15:16, Andy Shevchenko wrote:
> On Fri, Mar 18, 2022 at 10:03:11AM +0100, Christian Lamparter wrote:
>> the driver uses libata's "tag" values from in various arrays.
>> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
>> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.
>>
>> Otherwise ATA_TAG_INTERNAL cause a crash like this:
>>
>> | BUG: Kernel NULL pointer dereference at 0x00000000
>> | Faulting instruction address: 0xc03ed4b8
>> | Oops: Kernel access of bad area, sig: 11 [#1]
>> | BE PAGE_SIZE=4K PowerPC 44x Platform
>> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
>> | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
>> | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
>> | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
>> | DEAR: 00000000 ESR: 00000000
>> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
>> | [..]
>> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
>> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>> | Call Trace:
>> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
>> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
>> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
>> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
>> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
>> | [...]
>>
>> this is because sata_dwc_dma_xfer_complete() NULLs the
>> dma_pending's next neighbour "chan" (a *dma_chan struct) in
>> this '32' case right here (line ~735):
>>> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
>>
>> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
>> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
>> causes the crash.
> 
> ...
> 
>> ticerex said when I've asked him about his real name+email for the patches:
>> "Please use my forum nick."
>> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
>> (I know checkpatch.pl complains about that. But what can you do?)
> 
> I think Reported-by: is fine to have any kind of reference to the reporter.
> I can consider it false positive.
> 
> ...

K, I've reported this back to the reporter ;).

Documentation/process/maintainer-tip.rst and process/5.Posting.rst
provided some hints:

"Please note that if the bug was reported in private, then ask for
permission first before using the Reported-by tag."

and maintainer-tip.rst, the format should be:
``Reported-by: ``Reporter <reporter@mail>``

(My goal here is to get "a fix" merged, so conformance is key. ;-))

>> -#define SATA_DWC_QCMD_MAX	32
>> +#define SATA_DWC_QCMD_MAX	33
> 
> Can't we use
> 
> #define SATA_DWC_QCMD_MAX	(ATA_TAG_INTERNAL + 1)
> 

I've looked around a bit.

include/linux/libata.h itself has the following related definitions:

| enum {
|        ATA_MAX_QUEUE           = 32,
|        ATA_TAG_INTERNAL        = ATA_MAX_QUEUE,
| [..]

| struct ata_port {
|	[...]
| 	struct ata_queued_cmd   qcmd[ATA_MAX_QUEUE + 1];
|
| }

ATA_MAX_QUEUE seems to be the "size" value. Whereas ATA_TAG_INTERNAL
is used as (tag == ATA_TAG_INTERNAL) more often.


I came up with a viable compromise:

Would it be "OK" to define a "new" ATA_MAX_TAG?

This could be either set ATA_MAX_TAG = ATA_MAX_QUEUE + 1

or let it be assigned automatically, if it's placed after ATA_TAG_INTERNAL
in the libata.h's enum like this: (I prefer this, but it being "33" is not
obvious if you don't dabble in C all the time)

| enum {
|	[...]
|       ATA_MAX_QUEUE           = 32,
|       ATA_TAG_INTERNAL        = ATA_MAX_QUEUE,
|	/*
|	 * ATA_MAX_TAG accounts for ATA_MAX_QUEUE TAGs + 1 special TAG/slot
|	 * for ATA_TAG_INTERNAL (libata's internal commands/DMA management).
|	 * This needs to be it in this location.
|	 */
|	ATA_MAX_TAG,
|       [...]

This ATA_MAX_TAG could then be used for ata_port's qcmd (from above) and
sata_dwc_460ex's SATA_DWC_QCMD_MAX.

For good measure:

BUILD_BUG_ON(ATA_TAG_INTERNAL >= ATA_MAX_TAG);

could be added into libata.h's __ata_qc_from_tag().
It access the qcmd array, so anyone using libata's accessors will catch
future updates.

Is this fine or getting to close to being overbuild?

Thank you both for your insights,
Christian
Damien Le Moal March 19, 2022, midnight UTC | #5
On 3/19/22 03:06, Christian Lamparter wrote:
> On 18/03/2022 15:16, Andy Shevchenko wrote:
>> On Fri, Mar 18, 2022 at 10:03:11AM +0100, Christian Lamparter wrote:
>>> the driver uses libata's "tag" values from in various arrays.
>>> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
>>> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.
>>>
>>> Otherwise ATA_TAG_INTERNAL cause a crash like this:
>>>
>>> | BUG: Kernel NULL pointer dereference at 0x00000000
>>> | Faulting instruction address: 0xc03ed4b8
>>> | Oops: Kernel access of bad area, sig: 11 [#1]
>>> | BE PAGE_SIZE=4K PowerPC 44x Platform
>>> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
>>> | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
>>> | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
>>> | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
>>> | DEAR: 00000000 ESR: 00000000
>>> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
>>> | [..]
>>> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
>>> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>>> | Call Trace:
>>> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
>>> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>>> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
>>> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
>>> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
>>> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
>>> | [...]
>>>
>>> this is because sata_dwc_dma_xfer_complete() NULLs the
>>> dma_pending's next neighbour "chan" (a *dma_chan struct) in
>>> this '32' case right here (line ~735):
>>>> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
>>>
>>> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
>>> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
>>> causes the crash.
>>
>> ...
>>
>>> ticerex said when I've asked him about his real name+email for the patches:
>>> "Please use my forum nick."
>>> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
>>> (I know checkpatch.pl complains about that. But what can you do?)
>>
>> I think Reported-by: is fine to have any kind of reference to the reporter.
>> I can consider it false positive.
>>
>> ...
> 
> K, I've reported this back to the reporter ;).
> 
> Documentation/process/maintainer-tip.rst and process/5.Posting.rst
> provided some hints:
> 
> "Please note that if the bug was reported in private, then ask for
> permission first before using the Reported-by tag."
> 
> and maintainer-tip.rst, the format should be:
> ``Reported-by: ``Reporter <reporter@mail>``
> 
> (My goal here is to get "a fix" merged, so conformance is key. ;-))
> 
>>> -#define SATA_DWC_QCMD_MAX	32
>>> +#define SATA_DWC_QCMD_MAX	33
>>
>> Can't we use
>>
>> #define SATA_DWC_QCMD_MAX	(ATA_TAG_INTERNAL + 1)
>>
> 
> I've looked around a bit.
> 
> include/linux/libata.h itself has the following related definitions:
> 
> | enum {
> |        ATA_MAX_QUEUE           = 32,
> |        ATA_TAG_INTERNAL        = ATA_MAX_QUEUE,
> | [..]
> 
> | struct ata_port {
> |	[...]
> | 	struct ata_queued_cmd   qcmd[ATA_MAX_QUEUE + 1];
> |
> | }
> 
> ATA_MAX_QUEUE seems to be the "size" value. Whereas ATA_TAG_INTERNAL
> is used as (tag == ATA_TAG_INTERNAL) more often.
> 
> 
> I came up with a viable compromise:
> 
> Would it be "OK" to define a "new" ATA_MAX_TAG?
> 
> This could be either set ATA_MAX_TAG = ATA_MAX_QUEUE + 1

That will be confusing !

ATA physically allows only up to 32 tags (0 to 31) because the command tag
field is only 5 bits. So we should not define anything like this defining
a tag value that is invalid.

ATA_TAG_INTERNAL is special and used only to identify that the command is
a libata internal command, either for device scan/revalidate or for error
handling. This value is *never* used as a tag for an NCQ command because
*all* internal commands are in fact not NCQ ! The internal commands from
libata are all non queueable commands. Meaning that if a device driver
sees a qc with its tag set to ATA_TAG_INTERNAL, it means that there are no
other commands on-going and all tags should be unused in the driver.

In the case of the dw driver, it means that we could arbitrarily use any
of the valid tag values for managing that internal command without any
issue. But having looked at the driver, as I said, it is a bigger change
than just faking a 33rd "tag" that is in fact not a command tag at all.

For these reasons, I really prefer defining SATA_DWC_QCMD_MAX as
(ATA_MAX_QUEUE + 1) with the comment above it clarifying why it is defined
like that.

> 
> or let it be assigned automatically, if it's placed after ATA_TAG_INTERNAL
> in the libata.h's enum like this: (I prefer this, but it being "33" is not
> obvious if you don't dabble in C all the time)
> 
> | enum {
> |	[...]
> |       ATA_MAX_QUEUE           = 32,
> |       ATA_TAG_INTERNAL        = ATA_MAX_QUEUE,
> |	/*
> |	 * ATA_MAX_TAG accounts for ATA_MAX_QUEUE TAGs + 1 special TAG/slot
> |	 * for ATA_TAG_INTERNAL (libata's internal commands/DMA management).
> |	 * This needs to be it in this location.
> |	 */
> |	ATA_MAX_TAG,
> |       [...]
> 
> This ATA_MAX_TAG could then be used for ata_port's qcmd (from above) and
> sata_dwc_460ex's SATA_DWC_QCMD_MAX.

I understand your point. But again, given that ATA_TAG_INTERNAL is only a
special tag value for libata and cannot be used by devices, I would rather
not define this ATA_MAX_TAG macro, to avoid confusions.

> 
> For good measure:
> 
> BUILD_BUG_ON(ATA_TAG_INTERNAL >= ATA_MAX_TAG);
> 
> could be added into libata.h's __ata_qc_from_tag().
> It access the qcmd array, so anyone using libata's accessors will catch
> future updates.
> 
> Is this fine or getting to close to being overbuild?

Not overbuilt, but not fine either. In my opinion, the root cause of the
issue is that the DW driver blindly uses the qc->tag value regardless of
if the command is NCQ or not. For non-ncq commands, tag 0 could be used
always for the command management arrays index, as by definition, non NCQ
commands imply QD=1 operation.

As I mentioned, this is a slightly bigger fix though. So I am OK with
simply changing the macro definition to have the +1 for now. But I am
certainly not against getting this driver correctly fixed to properly
handle command types and tags :)
Damien Le Moal March 19, 2022, 12:04 a.m. UTC | #6
On 3/18/22 20:43, Christian Lamparter wrote:
> On 18/03/2022 10:34, Damien Le Moal wrote:
>> On 3/18/22 18:03, Christian Lamparter wrote:
>>> the driver uses libata's "tag" values from in various arrays.
>>> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
>>> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.
>>>
>>> Otherwise ATA_TAG_INTERNAL cause a crash like this:
>>>
>>> | BUG: Kernel NULL pointer dereference at 0x00000000
>>> | Faulting instruction address: 0xc03ed4b8
>>> | Oops: Kernel access of bad area, sig: 11 [#1]
>>> | BE PAGE_SIZE=4K PowerPC 44x Platform
>>> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
>>> | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
>>> | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
>>> | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
>>> | DEAR: 00000000 ESR: 00000000
>>> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
>>> | [..]
>>> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
>>> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>>> | Call Trace:
>>> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
>>> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>>> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
>>> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
>>> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
>>> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
>>> | [...]
>>>
>>> this is because sata_dwc_dma_xfer_complete() NULLs the
>>> dma_pending's next neighbour "chan" (a *dma_chan struct) in
>>> this '32' case right here (line ~735):
>>>> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
>>>
>>> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
>>> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
>>> causes the crash.
>>>
>>> Reported-by: ticerex (OpenWrt Forum)
>>
>> I would remove this since you have the link to the bug report below.
>> Without a real name & email, this does not make much sense.
> Ok.
> 
>>
>>> Fixes: 28361c403683c ("libata: add extra internal command")
>>> Cc: stable@kernel.org # 4.18+
>>> Link: https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464
>>> Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
>>> ---
>>> ticerex said when I've asked him about his real name+email for the patches:
>>> "Please use my forum nick."
>>> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
>>> (I know checkpatch.pl complains about that. But what can you do?)
>>> ---
>>>   drivers/ata/sata_dwc_460ex.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c
>>> index bec33d781ae0..061b27584667 100644
>>> --- a/drivers/ata/sata_dwc_460ex.c
>>> +++ b/drivers/ata/sata_dwc_460ex.c
>>> @@ -137,7 +137,7 @@ struct sata_dwc_device {
>>>   #endif
>>>   };
>>>   
>>> -#define SATA_DWC_QCMD_MAX	32
>>> +#define SATA_DWC_QCMD_MAX	33
>>
>> This is really ugly, but I do not see a better way to do it simply.
>> But at least, let's do something like this to avoid confusion and to show
>> that this driver is not doing some black magic with ATA drives:
>>
>> /*
>>   * Allow one extra special slot for commands and DMA management to
>>   * account for libata internal commands.
>>   */
>> #define SATA_DWC_QCMD_MAX	(ATA_MAX_QUEUE + 1)
>>
>> Thoughts ?
> 
> Yes, this works. That ATA_TAG_INTERNAL value has remained unchanged
> since Jens' change from 2018. How do you want to proceed?
> 
> Should I make a v2, or do you update the patch locally?

Please send a v2. Or send another patch properly fixing the driver tag
handling as explained in my previous email.

> 
> The way I understand it. this ATA_TAG_INTERNAL has the special MAGIC
> value so DMA "qc issues" do not interfere with possibly concurrent NCQ
> "qc issues" on the involved controllers.
> 
> sata_dwc_460ex's NCQ is disabled/gimped by line 1093:
> | /* .can_queue           = ATA_MAX_QUEUE, */)
> 
> Reassiging the ATA_TAG_INTERNAL could have been done.
> But just increasing the arrays worked :-).

No ! changing can_queue to something > 32 would be very wrong ! This
really defines the number of tags that can be used, so the device maximum
queue depth. And ATA maximum queue depth is 32 with NCQ commands. It
cannot be anything larger than 32.

> 
> Cheers,
> Christian
Damien Le Moal March 19, 2022, 12:13 a.m. UTC | #7
On 3/19/22 03:06, Christian Lamparter wrote:
> On 18/03/2022 15:16, Andy Shevchenko wrote:
>> On Fri, Mar 18, 2022 at 10:03:11AM +0100, Christian Lamparter wrote:
>>> the driver uses libata's "tag" values from in various arrays.
>>> Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
>>> the value of the SATA_DWC_QCMD_MAX needs to be bumped to 33.
>>>
>>> Otherwise ATA_TAG_INTERNAL cause a crash like this:
>>>
>>> | BUG: Kernel NULL pointer dereference at 0x00000000
>>> | Faulting instruction address: 0xc03ed4b8
>>> | Oops: Kernel access of bad area, sig: 11 [#1]
>>> | BE PAGE_SIZE=4K PowerPC 44x Platform
>>> | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
>>> | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
>>> | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
>>> | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
>>> | DEAR: 00000000 ESR: 00000000
>>> | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
>>> | [..]
>>> | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
>>> | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>>> | Call Trace:
>>> | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
>>> | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
>>> | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
>>> | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
>>> | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
>>> | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
>>> | [...]
>>>
>>> this is because sata_dwc_dma_xfer_complete() NULLs the
>>> dma_pending's next neighbour "chan" (a *dma_chan struct) in
>>> this '32' case right here (line ~735):
>>>> hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
>>>
>>> Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
>>> the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
>>> causes the crash.
>>
>> ...
>>
>>> ticerex said when I've asked him about his real name+email for the patches:
>>> "Please use my forum nick."
>>> <https://forum.openwrt.org/t/my-book-live-duo-reboot-loop/122464/14>
>>> (I know checkpatch.pl complains about that. But what can you do?)
>>
>> I think Reported-by: is fine to have any kind of reference to the reporter.
>> I can consider it false positive.
>>
>> ...
> 
> K, I've reported this back to the reporter ;).
> 
> Documentation/process/maintainer-tip.rst and process/5.Posting.rst
> provided some hints:
> 
> "Please note that if the bug was reported in private, then ask for
> permission first before using the Reported-by tag."
> 
> and maintainer-tip.rst, the format should be:
> ``Reported-by: ``Reporter <reporter@mail>``
> 
> (My goal here is to get "a fix" merged, so conformance is key. ;-))

No worries, a fix will go in :)

Since the person did not want to give his/her/they real name & email, we
could consider this as that person not accepting to be mentioned in a
reported-by tag. So dropping the tag is I think preferred. Keeping it as
you had it is fine too. No big deal.

> 
>>> -#define SATA_DWC_QCMD_MAX	32
>>> +#define SATA_DWC_QCMD_MAX	33
>>
>> Can't we use
>>
>> #define SATA_DWC_QCMD_MAX	(ATA_TAG_INTERNAL + 1)
>>
> 
> I've looked around a bit.
> 
> include/linux/libata.h itself has the following related definitions:
> 
> | enum {
> |        ATA_MAX_QUEUE           = 32,
> |        ATA_TAG_INTERNAL        = ATA_MAX_QUEUE,
> | [..]
> 
> | struct ata_port {
> |	[...]
> | 	struct ata_queued_cmd   qcmd[ATA_MAX_QUEUE + 1];
> |
> | }
> 
> ATA_MAX_QUEUE seems to be the "size" value. Whereas ATA_TAG_INTERNAL
> is used as (tag == ATA_TAG_INTERNAL) more often.
> 
> 
> I came up with a viable compromise:
> 
> Would it be "OK" to define a "new" ATA_MAX_TAG?
> 
> This could be either set ATA_MAX_TAG = ATA_MAX_QUEUE + 1
> 
> or let it be assigned automatically, if it's placed after ATA_TAG_INTERNAL
> in the libata.h's enum like this: (I prefer this, but it being "33" is not
> obvious if you don't dabble in C all the time)
> 
> | enum {
> |	[...]
> |       ATA_MAX_QUEUE           = 32,
> |       ATA_TAG_INTERNAL        = ATA_MAX_QUEUE,
> |	/*
> |	 * ATA_MAX_TAG accounts for ATA_MAX_QUEUE TAGs + 1 special TAG/slot
> |	 * for ATA_TAG_INTERNAL (libata's internal commands/DMA management).
> |	 * This needs to be it in this location.
> |	 */
> |	ATA_MAX_TAG,
> |       [...]
> 
> This ATA_MAX_TAG could then be used for ata_port's qcmd (from above) and
> sata_dwc_460ex's SATA_DWC_QCMD_MAX.
> 
> For good measure:
> 
> BUILD_BUG_ON(ATA_TAG_INTERNAL >= ATA_MAX_TAG);
> 
> could be added into libata.h's __ata_qc_from_tag().
> It access the qcmd array, so anyone using libata's accessors will catch
> future updates.
> 
> Is this fine or getting to close to being overbuild?
> 
> Thank you both for your insights,
> Christian
diff mbox series

Patch

diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c
index bec33d781ae0..061b27584667 100644
--- a/drivers/ata/sata_dwc_460ex.c
+++ b/drivers/ata/sata_dwc_460ex.c
@@ -137,7 +137,7 @@  struct sata_dwc_device {
 #endif
 };
 
-#define SATA_DWC_QCMD_MAX	32
+#define SATA_DWC_QCMD_MAX	33
 
 struct sata_dwc_device_port {
 	struct sata_dwc_device	*hsdev;