Patchwork [3/4] libata: Report zeroed read after Trim and max discard size

login
register
mail settings
Submitter Martin K. Petersen
Date Nov. 21, 2009, 2:45 a.m.
Message ID <1258771524-26673-4-git-send-email-martin.petersen@oracle.com>
Download mbox | patch
Permalink /patch/38955/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Martin K. Petersen - Nov. 21, 2009, 2:45 a.m.
Our current Trim payload is a single sector that can accommodate 64 *
65535 blocks being unmapped.  Report this value in the Block Limits
Maximum Unmap LBA count field.

If a storage device supports Trim and the DRAT and RZAT bits are set,
report TPRZ=1 in Read Capacity(16).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/ata/libata-scsi.c |   12 +++++++++---
 include/linux/ata.h       |   10 ++++++++++
 2 files changed, 19 insertions(+), 3 deletions(-)
Christoph Hellwig - Nov. 21, 2009, 10:49 a.m.
On Fri, Nov 20, 2009 at 09:45:23PM -0500, Martin K. Petersen wrote:
>  	 * with the unmap bit set.
>  	 */
> -	if (ata_id_has_trim(args->id))
> +	if (ata_id_has_trim(args->id)) {
> +		put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
>  		put_unaligned_be32(1, &rbuf[28]);

My reading of SPC is that the max unmap size only makes sense for
devices supporting UNMAP, while the SATL for now only supports WRITE
SAME with the unmap bit.

> -			rbuf[14] |= 0x80;
> +		if (ata_id_has_trim(args->id)) {
> +			rbuf[14] |= 0x80; /* TPE */
> +
> +			if (ata_id_has_zero_after_trim(args->id))
> +				rbuf[14] |= 0x40; /* TPRZ */

Looks good.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen - Nov. 21, 2009, 8:16 p.m.
>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> On Fri, Nov 20, 2009 at 09:45:23PM -0500, Martin K. Petersen wrote:
>> * with the unmap bit set.
>> */
>> - if (ata_id_has_trim(args->id))
>> + if (ata_id_has_trim(args->id)) {
>> + put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
>> put_unaligned_be32(1, &rbuf[28]);

Christoph> My reading of SPC is that the max unmap size only makes sense
Christoph> for devices supporting UNMAP, while the SATL for now only
Christoph> supports WRITE SAME with the unmap bit.

I was trying to help Eric figure out why his drive pooped on big Trim
requests.  For WRITE SAME the limit is inherent in the arguments,
whereas our SATL implementation is limited by the 512-byte WRITE SAME
payload.  So I needed a way to convey this up the stack.

Since you already return a B0 VPD page I thought it would be a
convenient place to communicate the max without having to tweak the
queue limits directly from within libata.

You are right that I'm relying on fuzziness in SBC which requires both
the max LBA count and the descriptor count to be specified for UNMAP.

So what I can do is this:

                if (lba_count) {
                        q->limits.max_discard_sectors =
                                lba_count * sector_sz >> 9;

                        if (desc_count)
                                sdkp->unmap = 1;
                }

That way we don't impose limits on "normal" WRITE SAME devices.
Christoph Hellwig - Nov. 24, 2009, 2:35 p.m.
On Sat, Nov 21, 2009 at 03:16:05PM -0500, Martin K. Petersen wrote:
> I was trying to help Eric figure out why his drive pooped on big Trim
> requests.  For WRITE SAME the limit is inherent in the arguments,
> whereas our SATL implementation is limited by the 512-byte WRITE SAME
> payload.  So I needed a way to convey this up the stack.
> 
> Since you already return a B0 VPD page I thought it would be a
> convenient place to communicate the max without having to tweak the
> queue limits directly from within libata.
> 
> You are right that I'm relying on fuzziness in SBC which requires both
> the max LBA count and the descriptor count to be specified for UNMAP.

I think the better way is to make sure we can support any TRIM that
can be sent down.  Given that TRIM is not NCQ-capable we can just
allocate one buffer for the TRIM ranges per TRIM capable device.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord - Nov. 24, 2009, 3:20 p.m.
Christoph Hellwig wrote:
> On Sat, Nov 21, 2009 at 03:16:05PM -0500, Martin K. Petersen wrote:
>> I was trying to help Eric figure out why his drive pooped on big Trim
>> requests.  For WRITE SAME the limit is inherent in the arguments,
>> whereas our SATL implementation is limited by the 512-byte WRITE SAME
>> payload.  So I needed a way to convey this up the stack.
>>
>> Since you already return a B0 VPD page I thought it would be a
>> convenient place to communicate the max without having to tweak the
>> queue limits directly from within libata.
>>
>> You are right that I'm relying on fuzziness in SBC which requires both
>> the max LBA count and the descriptor count to be specified for UNMAP.
> 
> I think the better way is to make sure we can support any TRIM that
> can be sent down.  Given that TRIM is not NCQ-capable we can just
> allocate one buffer for the TRIM ranges per TRIM capable device.
..

Good approach.

I suppose that buffer would be only 512 bytes long, per device?
That might be a bit restrictive, as TRIM can handle much larger
requests, and some drives (Indinlinx-based at least) prefer large
TRIM lists at present.

On the other hand, the Marvell chipsets cannot handle more than a
single sector of data without first fixing the driver to work around
chipset bugs.  That's probably unique to sata_mv, though.

Cheers

Mark Lord

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Nov. 24, 2009, 3:21 p.m.
On Tue, Nov 24, 2009 at 10:20:28AM -0500, Mark Lord wrote:
> I suppose that buffer would be only 512 bytes long, per device?
> That might be a bit restrictive, as TRIM can handle much larger
> requests, and some drives (Indinlinx-based at least) prefer large
> TRIM lists at present.

Allocating a buffer per device would allow it to be much larger.


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e0995c4..08d4ab7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2116,8 +2116,10 @@  static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
 	 * that we support some form of unmap - in thise case via WRITE SAME
 	 * with the unmap bit set.
 	 */
-	if (ata_id_has_trim(args->id))
+	if (ata_id_has_trim(args->id)) {
+		put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
 		put_unaligned_be32(1, &rbuf[28]);
+	}
 
 	return 0;
 }
@@ -2412,8 +2414,12 @@  static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 		rbuf[14] = (lowest_aligned >> 8) & 0x3f;
 		rbuf[15] = lowest_aligned;
 
-		if (ata_id_has_trim(args->id))
-			rbuf[14] |= 0x80;
+		if (ata_id_has_trim(args->id)) {
+			rbuf[14] |= 0x80; /* TPE */
+
+			if (ata_id_has_zero_after_trim(args->id))
+				rbuf[14] |= 0x40; /* TPRZ */
+		}
 	}
 
 	return 0;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index e2595e8..b18b2bb 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -75,6 +75,7 @@  enum {
 	ATA_ID_EIDE_DMA_TIME	= 66,
 	ATA_ID_EIDE_PIO		= 67,
 	ATA_ID_EIDE_PIO_IORDY	= 68,
+	ATA_ID_ADDITIONAL_SUPP	= 69,
 	ATA_ID_QUEUE_DEPTH	= 75,
 	ATA_ID_MAJOR_VER	= 80,
 	ATA_ID_COMMAND_SET_1	= 82,
@@ -816,6 +817,15 @@  static inline int ata_id_has_trim(const u16 *id)
 	return 0;
 }
 
+static inline int ata_id_has_zero_after_trim(const u16 *id)
+{
+	/* DSM supported, deterministic read, and read zero after trim set */
+	if (ata_id_has_trim(id) && id[ATA_ID_ADDITIONAL_SUPP] & 0x4020)
+		return 1;
+
+	return 0;
+}
+
 static inline int ata_id_current_chs_valid(const u16 *id)
 {
 	/* For ATA-1 devices, if the INITIALIZE DEVICE PARAMETERS command