Message ID | 20240223060752.4926-1-skashyap@marvell.com |
---|---|
State | New |
Headers | show |
Series | libata-sata: Check SDB_FIS for completion of DMA transfer before completing the commands. | expand |
On 2/23/24 15:07, Saurav Kashyap wrote: > Sequence leading to an issue as per PCIe trace > - PxSact is read with slots 7 and 24 being cleared. > - Host starts processing these commands while data is not in system > memory yet. What ? If the DMA transfers are not completed yet why is the adapter clearing sact ? That sounds like a very nasty HW bug. > - Last pkt of 512B was sent to host. > - SDB.FIS is copied, telling host command slot 24 is done. And then what ? could you please descibe all of this in more detail ? What workloads was this ? What is the device used ? etc. This commit messsage is way too short to describe what seems to be a very serious bug with an adapter that you are not even mentioning here. Please expand this description. > > Cc: Soochon Radee <soochon@google.com> > Tested-by: Manoj Phadtare <mphadtare@marvell.com> > Signed-off-by: Saurav Kashyap <skashyap@marvell.com> > --- > drivers/ata/libata-sata.c | 28 +++++++++++++++++++++++++++- > 1 file changed, 27 insertions(+), 1 deletion(-) > > diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c > index b6656c287175..b2310f3a2a02 100644 > --- a/drivers/ata/libata-sata.c > +++ b/drivers/ata/libata-sata.c > @@ -14,9 +14,11 @@ > #include <scsi/scsi_eh.h> > #include <linux/libata.h> > #include <asm/unaligned.h> > +#include <linux/pci.h> > > #include "libata.h" > #include "libata-transport.h" > +#include "ahci.h" > > /* debounce timing parameters in msecs { interval, duration, timeout } */ > const unsigned int sata_deb_timing_normal[] = { 5, 100, 2000 }; > @@ -649,6 +651,7 @@ EXPORT_SYMBOL_GPL(sata_link_hardreset); > int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active) > { > u64 done_mask, ap_qc_active = ap->qc_active; > + struct pci_dev *pdev = to_pci_dev(ap->host->dev); > int nr_done = 0; > > /* > @@ -677,7 +680,30 @@ int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active) > unsigned int tag = __ffs64(done_mask); > > qc = ata_qc_from_tag(ap, tag); > - if (qc) { > + if (pdev->vendor == PCI_VENDOR_ID_MARVELL_EXT && > + (pdev->device == 0x9215 || pdev->device == 0x9235)) { > + struct ahci_port_priv *pp = ap->private_data; > + u8 *rx_fis = pp->rx_fis; > + > + if (pp->fbs_enabled) > + rx_fis += ap->link.pmp * AHCI_RX_FIS_SZ; > + > + if (!qc) > + continue; > + > + if (ata_is_ncq(qc->tf.protocol)) { > + u32 *fis = (u32 *)(rx_fis + RX_FIS_SDB); > + u32 fis_active = fis[1]; > + > + if ((fis_active & (1 << tag))) { > + ata_qc_complete(qc); > + nr_done++; > + } > + } else { > + ata_qc_complete(qc); > + nr_done++; > + } > + } else if (qc) { > ata_qc_complete(qc); > nr_done++; > }
On 2/23/24 15:07, Saurav Kashyap wrote: > Sequence leading to an issue as per PCIe trace > - PxSact is read with slots 7 and 24 being cleared. > - Host starts processing these commands while data is not in system > memory yet. > - Last pkt of 512B was sent to host. > - SDB.FIS is copied, telling host command slot 24 is done. And send patches to the maintainers as well as to the list. scripts/get_maintainer.pl drivers/ata/
Hi Damien, Thanks for comments, I will submit v2. Thanks, ~Saurav > -----Original Message----- > From: Damien Le Moal <dlemoal@kernel.org> > Sent: Friday, February 23, 2024 12:09 PM > To: Saurav Kashyap <skashyap@marvell.com>; linux-ide@vger.kernel.org > Cc: soochon@google.com; Manoj Phadtare <mphadtare@marvell.com> > Subject: [EXT] Re: [PATCH] libata-sata: Check SDB_FIS for completion of DMA > transfer before completing the commands. > > External Email > > ---------------------------------------------------------------------- > On 2/23/24 15:07, Saurav Kashyap wrote: > > Sequence leading to an issue as per PCIe trace > > - PxSact is read with slots 7 and 24 being cleared. > > - Host starts processing these commands while data is not in system > > memory yet. > > What ? If the DMA transfers are not completed yet why is the adapter clearing > sact ? That sounds like a very nasty HW bug. > > > - Last pkt of 512B was sent to host. > > - SDB.FIS is copied, telling host command slot 24 is done. > > And then what ? could you please descibe all of this in more detail ? What > workloads was this ? What is the device used ? etc. This commit messsage is way > too short to describe what seems to be a very serious bug with an adapter that > you are not even mentioning here. Please expand this description. > > > > > Cc: Soochon Radee <soochon@google.com> > > Tested-by: Manoj Phadtare <mphadtare@marvell.com> > > Signed-off-by: Saurav Kashyap <skashyap@marvell.com> > > --- > > drivers/ata/libata-sata.c | 28 +++++++++++++++++++++++++++- > > 1 file changed, 27 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c > > index b6656c287175..b2310f3a2a02 100644 > > --- a/drivers/ata/libata-sata.c > > +++ b/drivers/ata/libata-sata.c > > @@ -14,9 +14,11 @@ > > #include <scsi/scsi_eh.h> > > #include <linux/libata.h> > > #include <asm/unaligned.h> > > +#include <linux/pci.h> > > > > #include "libata.h" > > #include "libata-transport.h" > > +#include "ahci.h" > > > > /* debounce timing parameters in msecs { interval, duration, timeout } */ > > const unsigned int sata_deb_timing_normal[] = { 5, 100, 2000 }; > > @@ -649,6 +651,7 @@ EXPORT_SYMBOL_GPL(sata_link_hardreset); > > int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active) > > { > > u64 done_mask, ap_qc_active = ap->qc_active; > > + struct pci_dev *pdev = to_pci_dev(ap->host->dev); > > int nr_done = 0; > > > > /* > > @@ -677,7 +680,30 @@ int ata_qc_complete_multiple(struct ata_port *ap, > u64 qc_active) > > unsigned int tag = __ffs64(done_mask); > > > > qc = ata_qc_from_tag(ap, tag); > > - if (qc) { > > + if (pdev->vendor == PCI_VENDOR_ID_MARVELL_EXT && > > + (pdev->device == 0x9215 || pdev->device == 0x9235)) { > > + struct ahci_port_priv *pp = ap->private_data; > > + u8 *rx_fis = pp->rx_fis; > > + > > + if (pp->fbs_enabled) > > + rx_fis += ap->link.pmp * AHCI_RX_FIS_SZ; > > + > > + if (!qc) > > + continue; > > + > > + if (ata_is_ncq(qc->tf.protocol)) { > > + u32 *fis = (u32 *)(rx_fis + RX_FIS_SDB); > > + u32 fis_active = fis[1]; > > + > > + if ((fis_active & (1 << tag))) { > > + ata_qc_complete(qc); > > + nr_done++; > > + } > > + } else { > > + ata_qc_complete(qc); > > + nr_done++; > > + } > > + } else if (qc) { > > ata_qc_complete(qc); > > nr_done++; > > } > > -- > Damien Le Moal > Western Digital Research
diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c index b6656c287175..b2310f3a2a02 100644 --- a/drivers/ata/libata-sata.c +++ b/drivers/ata/libata-sata.c @@ -14,9 +14,11 @@ #include <scsi/scsi_eh.h> #include <linux/libata.h> #include <asm/unaligned.h> +#include <linux/pci.h> #include "libata.h" #include "libata-transport.h" +#include "ahci.h" /* debounce timing parameters in msecs { interval, duration, timeout } */ const unsigned int sata_deb_timing_normal[] = { 5, 100, 2000 }; @@ -649,6 +651,7 @@ EXPORT_SYMBOL_GPL(sata_link_hardreset); int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active) { u64 done_mask, ap_qc_active = ap->qc_active; + struct pci_dev *pdev = to_pci_dev(ap->host->dev); int nr_done = 0; /* @@ -677,7 +680,30 @@ int ata_qc_complete_multiple(struct ata_port *ap, u64 qc_active) unsigned int tag = __ffs64(done_mask); qc = ata_qc_from_tag(ap, tag); - if (qc) { + if (pdev->vendor == PCI_VENDOR_ID_MARVELL_EXT && + (pdev->device == 0x9215 || pdev->device == 0x9235)) { + struct ahci_port_priv *pp = ap->private_data; + u8 *rx_fis = pp->rx_fis; + + if (pp->fbs_enabled) + rx_fis += ap->link.pmp * AHCI_RX_FIS_SZ; + + if (!qc) + continue; + + if (ata_is_ncq(qc->tf.protocol)) { + u32 *fis = (u32 *)(rx_fis + RX_FIS_SDB); + u32 fis_active = fis[1]; + + if ((fis_active & (1 << tag))) { + ata_qc_complete(qc); + nr_done++; + } + } else { + ata_qc_complete(qc); + nr_done++; + } + } else if (qc) { ata_qc_complete(qc); nr_done++; }