diff mbox

libata: Whitelist SSDs that are known to properly return zeroes after TRIM

Message ID 1420727311-7066-1-git-send-email-martin.petersen@oracle.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Martin K. Petersen Jan. 8, 2015, 2:28 p.m. UTC
As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return
Zero After Trim) flags in the ATA Command Set are unreliable in the
sense that they only define what happens if the device successfully
executed the DSM TRIM command. TRIM is only advisory, however, and the
device is free to silently ignore all or parts of the request.

In practice this renders the DRAT and RZAT flags completely useless and
because the results are unpredictable we decided to disable discard in
MD for 3.18 to avoid the risk of data corruption.

Hardware vendors in the real world obviously need better guarantees than
what the standards bodies provide. Unfortuntely those guarantees are
encoded in product requirements documents rather than somewhere we can
key off of them programatically. So we are compelled to disabling
discard_zeroes_data for all devices unless we explicitly have data to
support whitelisting them.

This patch whitelists SSDs from a few of the main vendors. None of the
whitelists are based on written guarantees. They are purely based on
empirical evidence collected from internal and external users that have
tested or qualified these drives in RAID deployments.

The whitelist is only meant as a starting point and is by no means
comprehensive:

   - All intel SSD models except for 510
   - Micron M5?0/M600
   - Samsung SSDs
   - Seagate SSDs

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/ata/libata-core.c | 18 ++++++++++++++----
 drivers/ata/libata-scsi.c | 10 ++++++----
 include/linux/libata.h    |  1 +
 3 files changed, 21 insertions(+), 8 deletions(-)

Comments

Tejun Heo Jan. 8, 2015, 3:11 p.m. UTC | #1
Hello, Martin.

The patch looks okay to me.  Just one nit.

On Thu, Jan 08, 2015 at 09:28:31AM -0500, Martin K. Petersen wrote:
> @@ -4233,10 +4233,20 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
>  	{ "PIONEER DVD-RW  DVR-216D",	NULL,	ATA_HORKAGE_NOSETXFER },
>  
>  	/* devices that don't properly handle queued TRIM commands */
> -	{ "Micron_M500*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
> -	{ "Crucial_CT???M500SSD*",	NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
> -	{ "Micron_M550*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
> -	{ "Crucial_CT*M550SSD*",	NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
> +	{ "Micron_M[56]*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
> +						ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +	{ "Crucial_CT*SSD*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
> +
> +	/*
> +	 * DRAT/RZAT are weak guarantees. Explicitly black/whitelist
> +	 * SSDs that provide reliable zero after TRIM.
> +	 */

Can you please be more detailed on the above explanation?  It's not
like most people would be familiar with acronyms like DRAT/RZAT and
what the surrounding situation is like to require this sort of
whitelisting.

> +	{ "INTEL*SSDSC2MH*",		NULL,	0, }, /* Blacklist intel 510 */
> +	{ "INTEL*SSD*", 		NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },

I think the above two can use a bit more verbose explanation too.

> +	{ "SSD*INTEL*",			NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +	{ "Samsung*SSD*",		NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +	{ "SAMSUNG*SSD*",		NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +	{ "ST[1248][0248]0[FH]*",	NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },

Thanks.
Tim Small Jan. 8, 2015, 3:58 p.m. UTC | #2
On 08/01/15 14:28, Martin K. Petersen wrote:


> +	{ "Micron_M[56]*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
> +						ATA_HORKAGE_ZERO_AFTER_TRIM, },


A minor quibble - but all the other HORKAGE flags are used to identify
things which are broken or non-functional about a device (and this is
implicit with the use of the word 'horkage' IMO).

However, this proposed flag is trying to imply behaviour which is an
"extra feature" (beyond the requirements of the spec).  So the intention
is to whitelist using a mechanism which is otherwise used only been used
to blacklist.

I know it's difficult coming up with something which isn't too
wordy/verbose, but IMO any of:

ATA_HORKAGE_TRIM_ALWAYS_ZEROS

ATA_TRIM_ALWAYS_ZEROS

ATA_RELIABLE_ZERO_AFTER_TRIM


would seem clearer to me, because as the patch currently stands,
"ATA_HORKAGE_ZERO_AFTER_TRIM" implies to me "zero after trim is broken
on this device".

Particularly with the Micron excerpt quoted above I initially parsed
this as "don't issue tagged TRIM commands to this device, and don't
assume it'll read zeros after TRIM either".


Tim.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen Jan. 9, 2015, 8:52 p.m. UTC | #3
>>>>> "Tim" == Tim Small <tim@seoss.co.uk> writes:

Tim> would seem clearer to me, because as the patch currently stands,
Tim> "ATA_HORKAGE_ZERO_AFTER_TRIM" implies to me "zero after trim is
Tim> broken on this device".

In SCSI we use often the term "quirk" regardless of whether we enable or
disable a feature. So "horkage" didn't really bother me much in this
context. But I'm happy to submit a name change patch if Tejun thinks
it's a valid concern.
Tejun Heo Jan. 9, 2015, 9:39 p.m. UTC | #4
On Fri, Jan 09, 2015 at 03:52:21PM -0500, Martin K. Petersen wrote:
> >>>>> "Tim" == Tim Small <tim@seoss.co.uk> writes:
> 
> Tim> would seem clearer to me, because as the patch currently stands,
> Tim> "ATA_HORKAGE_ZERO_AFTER_TRIM" implies to me "zero after trim is
> Tim> broken on this device".
> 
> In SCSI we use often the term "quirk" regardless of whether we enable or
> disable a feature. So "horkage" didn't really bother me much in this
> context. But I'm happy to submit a name change patch if Tejun thinks
> it's a valid concern.

I think it's fine.  It's not like its polarity is confusing - nobody
would read it as meaning the opposite.

Thanks.
diff mbox

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 5c84fb5c3372..7c03401dc5d8 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4233,10 +4233,20 @@  static const struct ata_blacklist_entry ata_device_blacklist [] = {
 	{ "PIONEER DVD-RW  DVR-216D",	NULL,	ATA_HORKAGE_NOSETXFER },
 
 	/* devices that don't properly handle queued TRIM commands */
-	{ "Micron_M500*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
-	{ "Crucial_CT???M500SSD*",	NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
-	{ "Micron_M550*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
-	{ "Crucial_CT*M550SSD*",	NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
+	{ "Micron_M[56]*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
+						ATA_HORKAGE_ZERO_AFTER_TRIM, },
+	{ "Crucial_CT*SSD*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM, },
+
+	/*
+	 * DRAT/RZAT are weak guarantees. Explicitly black/whitelist
+	 * SSDs that provide reliable zero after TRIM.
+	 */
+	{ "INTEL*SSDSC2MH*",		NULL,	0, }, /* Blacklist intel 510 */
+	{ "INTEL*SSD*", 		NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
+	{ "SSD*INTEL*",			NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
+	{ "Samsung*SSD*",		NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
+	{ "SAMSUNG*SSD*",		NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
+	{ "ST[1248][0248]0[FH]*",	NULL,	ATA_HORKAGE_ZERO_AFTER_TRIM, },
 
 	/*
 	 * Some WD SATA-I drives spin up and down erratically when the link
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 7659d6468303..280729325ebd 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2532,13 +2532,15 @@  static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 		rbuf[15] = lowest_aligned;
 
 		if (ata_id_has_trim(args->id)) {
-			rbuf[14] |= 0x80; /* TPE */
+			rbuf[14] |= 0x80; /* LBPME */
 
-			if (ata_id_has_zero_after_trim(args->id))
-				rbuf[14] |= 0x40; /* TPRZ */
+			if (ata_id_has_zero_after_trim(args->id) &&
+			    dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) {
+				ata_dev_info(dev, "Enabling discard_zeroes_data\n");
+				rbuf[14] |= 0x40; /* LBPRZ */
+			}
 		}
 	}
-
 	return 0;
 }
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 2d182413b1db..f2b440e44fd7 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -422,6 +422,7 @@  enum {
 	ATA_HORKAGE_NO_NCQ_TRIM	= (1 << 19),	/* don't use queued TRIM */
 	ATA_HORKAGE_NOLPM	= (1 << 20),	/* don't use LPM */
 	ATA_HORKAGE_WD_BROKEN_LPM = (1 << 21),	/* some WDs have broken LPM */
+	ATA_HORKAGE_ZERO_AFTER_TRIM = (1 << 22),/* guarantees zero after trim */
 
 	 /* DMA mask for user DMA control: User visible values; DO NOT
 	    renumber */