diff mbox series

libata-sata: Check SDB_FIS for completion of DMA transfer before completing the commands.

Message ID 20240223060752.4926-1-skashyap@marvell.com
State New
Headers show
Series libata-sata: Check SDB_FIS for completion of DMA transfer before completing the commands. | expand

Commit Message

Saurav Kashyap Feb. 23, 2024, 6:07 a.m. UTC
Sequence leading to an issue as per PCIe trace
- PxSact is read with slots 7 and 24 being cleared.
- Host starts processing these commands while data is not in system
  memory yet.
- Last pkt of 512B was sent to host.
- SDB.FIS is copied, telling host command slot 24 is done.

Cc: Soochon Radee <soochon@google.com>
Tested-by: Manoj Phadtare <mphadtare@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
---
 drivers/ata/libata-sata.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Comments

Damien Le Moal Feb. 23, 2024, 6:38 a.m. UTC | #1
On 2/23/24 15:07, Saurav Kashyap wrote:
> Sequence leading to an issue as per PCIe trace
> - PxSact is read with slots 7 and 24 being cleared.
> - Host starts processing these commands while data is not in system
>   memory yet.

What ? If the DMA transfers are not completed yet why is the adapter clearing
sact ? That sounds like a very nasty HW bug.

> - Last pkt of 512B was sent to host.
> - SDB.FIS is copied, telling host command slot 24 is done.

And then what ? could you please descibe all of this in more detail ? What
workloads was this ? What is the device used ? etc. This commit messsage is way
too short to describe what seems to be a very serious bug with an adapter that
you are not even mentioning here. Please expand this description.

> 
> Cc: Soochon Radee <soochon@google.com>
> Tested-by: Manoj Phadtare <mphadtare@marvell.com>
> Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
> ---
>  drivers/ata/libata-sata.c | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index b6656c287175..b2310f3a2a02 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -14,9 +14,11 @@
>  #include <scsi/scsi_eh.h>
>  #include <linux/libata.h>
>  #include <asm/unaligned.h>
> +#include <linux/pci.h>
>  
>  #include "libata.h"
>  #include "libata-transport.h"
> +#include "ahci.h"
>  
>  /* debounce timing parameters in msecs { interval, duration, timeout } */
>  const unsigned int sata_deb_timing_normal[]		= {   5,  100, 2000 };
> @@ -649,6 +651,7 @@ EXPORT_SYMBOL_GPL(sata_link_hardreset);
>  int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active)
>  {
>  	u64 done_mask, ap_qc_active = ap->qc_active;
> +	struct pci_dev *pdev = to_pci_dev(ap->host->dev);
>  	int nr_done = 0;
>  
>  	/*
> @@ -677,7 +680,30 @@ int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active)
>  		unsigned int tag = __ffs64(done_mask);
>  
>  		qc = ata_qc_from_tag(ap, tag);
> -		if (qc) {
> +		if (pdev->vendor == PCI_VENDOR_ID_MARVELL_EXT &&
> +		    (pdev->device == 0x9215 || pdev->device == 0x9235)) {
> +			struct ahci_port_priv *pp = ap->private_data;
> +			u8 *rx_fis = pp->rx_fis;
> +
> +			if (pp->fbs_enabled)
> +				rx_fis += ap->link.pmp * AHCI_RX_FIS_SZ;
> +
> +			if (!qc)
> +				continue;
> +
> +			if (ata_is_ncq(qc->tf.protocol)) {
> +				u32 *fis = (u32 *)(rx_fis + RX_FIS_SDB);
> +				u32 fis_active = fis[1];
> +
> +				if ((fis_active & (1 << tag))) {
> +					ata_qc_complete(qc);
> +					nr_done++;
> +				}
> +			} else {
> +				ata_qc_complete(qc);
> +				nr_done++;
> +			}
> +		} else if (qc) {
>  			ata_qc_complete(qc);
>  			nr_done++;
>  		}
Damien Le Moal Feb. 23, 2024, 6:40 a.m. UTC | #2
On 2/23/24 15:07, Saurav Kashyap wrote:
> Sequence leading to an issue as per PCIe trace
> - PxSact is read with slots 7 and 24 being cleared.
> - Host starts processing these commands while data is not in system
>   memory yet.
> - Last pkt of 512B was sent to host.
> - SDB.FIS is copied, telling host command slot 24 is done.

And send patches to the maintainers as well as to the list.

scripts/get_maintainer.pl drivers/ata/
Saurav Kashyap Feb. 29, 2024, 3:52 a.m. UTC | #3
Hi Damien,
Thanks for comments, I will submit v2.

Thanks,
~Saurav

> -----Original Message-----
> From: Damien Le Moal <dlemoal@kernel.org>
> Sent: Friday, February 23, 2024 12:09 PM
> To: Saurav Kashyap <skashyap@marvell.com>; linux-ide@vger.kernel.org
> Cc: soochon@google.com; Manoj Phadtare <mphadtare@marvell.com>
> Subject: [EXT] Re: [PATCH] libata-sata: Check SDB_FIS for completion of DMA
> transfer before completing the commands.
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 2/23/24 15:07, Saurav Kashyap wrote:
> > Sequence leading to an issue as per PCIe trace
> > - PxSact is read with slots 7 and 24 being cleared.
> > - Host starts processing these commands while data is not in system
> >   memory yet.
> 
> What ? If the DMA transfers are not completed yet why is the adapter clearing
> sact ? That sounds like a very nasty HW bug.
> 
> > - Last pkt of 512B was sent to host.
> > - SDB.FIS is copied, telling host command slot 24 is done.
> 
> And then what ? could you please descibe all of this in more detail ? What
> workloads was this ? What is the device used ? etc. This commit messsage is way
> too short to describe what seems to be a very serious bug with an adapter that
> you are not even mentioning here. Please expand this description.
> 
> >
> > Cc: Soochon Radee <soochon@google.com>
> > Tested-by: Manoj Phadtare <mphadtare@marvell.com>
> > Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
> > ---
> >  drivers/ata/libata-sata.c | 28 +++++++++++++++++++++++++++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> > index b6656c287175..b2310f3a2a02 100644
> > --- a/drivers/ata/libata-sata.c
> > +++ b/drivers/ata/libata-sata.c
> > @@ -14,9 +14,11 @@
> >  #include <scsi/scsi_eh.h>
> >  #include <linux/libata.h>
> >  #include <asm/unaligned.h>
> > +#include <linux/pci.h>
> >
> >  #include "libata.h"
> >  #include "libata-transport.h"
> > +#include "ahci.h"
> >
> >  /* debounce timing parameters in msecs { interval, duration, timeout } */
> >  const unsigned int sata_deb_timing_normal[]		= {   5,  100, 2000 };
> > @@ -649,6 +651,7 @@ EXPORT_SYMBOL_GPL(sata_link_hardreset);
> >  int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active)
> >  {
> >  	u64 done_mask, ap_qc_active = ap->qc_active;
> > +	struct pci_dev *pdev = to_pci_dev(ap->host->dev);
> >  	int nr_done = 0;
> >
> >  	/*
> > @@ -677,7 +680,30 @@ int ata_qc_complete_multiple(struct ata_port *ap,
> u64 qc_active)
> >  		unsigned int tag = __ffs64(done_mask);
> >
> >  		qc = ata_qc_from_tag(ap, tag);
> > -		if (qc) {
> > +		if (pdev->vendor == PCI_VENDOR_ID_MARVELL_EXT &&
> > +		    (pdev->device == 0x9215 || pdev->device == 0x9235)) {
> > +			struct ahci_port_priv *pp = ap->private_data;
> > +			u8 *rx_fis = pp->rx_fis;
> > +
> > +			if (pp->fbs_enabled)
> > +				rx_fis += ap->link.pmp * AHCI_RX_FIS_SZ;
> > +
> > +			if (!qc)
> > +				continue;
> > +
> > +			if (ata_is_ncq(qc->tf.protocol)) {
> > +				u32 *fis = (u32 *)(rx_fis + RX_FIS_SDB);
> > +				u32 fis_active = fis[1];
> > +
> > +				if ((fis_active & (1 << tag))) {
> > +					ata_qc_complete(qc);
> > +					nr_done++;
> > +				}
> > +			} else {
> > +				ata_qc_complete(qc);
> > +				nr_done++;
> > +			}
> > +		} else if (qc) {
> >  			ata_qc_complete(qc);
> >  			nr_done++;
> >  		}
> 
> --
> Damien Le Moal
> Western Digital Research
diff mbox series

Patch

diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index b6656c287175..b2310f3a2a02 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -14,9 +14,11 @@ 
 #include <scsi/scsi_eh.h>
 #include <linux/libata.h>
 #include <asm/unaligned.h>
+#include <linux/pci.h>
 
 #include "libata.h"
 #include "libata-transport.h"
+#include "ahci.h"
 
 /* debounce timing parameters in msecs { interval, duration, timeout } */
 const unsigned int sata_deb_timing_normal[]		= {   5,  100, 2000 };
@@ -649,6 +651,7 @@  EXPORT_SYMBOL_GPL(sata_link_hardreset);
 int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active)
 {
 	u64 done_mask, ap_qc_active = ap->qc_active;
+	struct pci_dev *pdev = to_pci_dev(ap->host->dev);
 	int nr_done = 0;
 
 	/*
@@ -677,7 +680,30 @@  int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active)
 		unsigned int tag = __ffs64(done_mask);
 
 		qc = ata_qc_from_tag(ap, tag);
-		if (qc) {
+		if (pdev->vendor == PCI_VENDOR_ID_MARVELL_EXT &&
+		    (pdev->device == 0x9215 || pdev->device == 0x9235)) {
+			struct ahci_port_priv *pp = ap->private_data;
+			u8 *rx_fis = pp->rx_fis;
+
+			if (pp->fbs_enabled)
+				rx_fis += ap->link.pmp * AHCI_RX_FIS_SZ;
+
+			if (!qc)
+				continue;
+
+			if (ata_is_ncq(qc->tf.protocol)) {
+				u32 *fis = (u32 *)(rx_fis + RX_FIS_SDB);
+				u32 fis_active = fis[1];
+
+				if ((fis_active & (1 << tag))) {
+					ata_qc_complete(qc);
+					nr_done++;
+				}
+			} else {
+				ata_qc_complete(qc);
+				nr_done++;
+			}
+		} else if (qc) {
 			ata_qc_complete(qc);
 			nr_done++;
 		}