Patchwork libata: add TRIM support

login
register
mail settings
Submitter Christoph Hellwig
Date Nov. 16, 2009, 3:43 p.m.
Message ID <20091116154343.GA6672@infradead.org>
Download mbox | patch
Permalink /patch/38508/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Christoph Hellwig - Nov. 16, 2009, 3:43 p.m.
Add support for the ATA TRIM command in libata.  We translate a WRITE SAME 16
command with the unmap bit set into an ATA TRIM command and export enough
information in READ CAPACITY 16 and the block limits EVPD page so that the new
SCSI layer discard support will driver this for us.

Note that I hardcode the WRITE_SAME_16 opcode for now as the patch to introduce
the symbolic is not in 2.6.32 yet but only in the SCSI tree - as soon as it is
merged we can fix it up to properly use the symbolic name.

Tested with a OCZ Vertex SSD and my still quite hackish TRIM emulation
for qemu.

Signed-off-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Nov. 16, 2009, 4:07 p.m.
> +	min_io_sectors = 1;
> +	if ((args->id[106] & 0xc000) == 0x4000 && (args->id[106] & (1 << 13)))
> +		min_io_sectors *= args->id[106] & 0xf;

That magic probably wants to be an ata_id_foo helper of some sort but
otherwise all looks sensible enough for ATA. I'm assuming gcc is smart
enough to collapse that into

	id[106] & 0xE000 != 0x6000

anyway

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Shtylyov - Nov. 16, 2009, 4:19 p.m.
Hello.

Alan Cox wrote:

>> +	min_io_sectors = 1;
>> +	if ((args->id[106] & 0xc000) == 0x4000 && (args->id[106] & (1 << 13)))
>> +		min_io_sectors *= args->id[106] & 0xf;
>>     
>
> That magic probably wants to be an ata_id_foo helper of some sort but
> otherwise all looks sensible enough for ATA. I'm assuming gcc is smart
> enough to collapse that into
>
> 	id[106] & 0xE000 != 0x6000
>   

   You probably meant ==, not !=. Not sure if gcc is that smart...

WBR, Sergei


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Nov. 17, 2009, 3:34 a.m.
Overall, OK.

Needs minor revisions, including one bug fix (tf->device stomping).


> +static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
> +{
> +	u32 min_io_sectors;
> +
> +	rbuf[1] = 0xb0;
> +	rbuf[3] = 0x3c;		/* required VPD size with unmap support */
> +
> +	/*
> +	 * Optimal transfer length granularity.
> +	 *
> +	 * This is always one physical block, but for disks with a smaller
> +	 * logical than physical sector size we need to figure out what the
> +	 * latter is.
> +	 */
> +	min_io_sectors = 1;
> +	if ((args->id[106]&  0xc000) == 0x4000&&  (args->id[106]&  (1<<  13)))
> +		min_io_sectors *= args->id[106]&  0xf;

args->id[] access should be via ata_id_* functions from include/linux/ata.h.

Create new ata_id_* as needed.

Plus...  add some whitespace before all C operators, to be consistent 
with the rest of libata.


> +	buf = page_address(sg_page(scsi_sglist(scmd)));
> +	size = ata_set_lba_range_entries(buf, 512 / 8, block, n_block);
> +
> +	tf->protocol = ATA_PROT_DMA;
> +	tf->hob_feature = 0;
> +	tf->feature = ATA_DSM_TRIM;
> +	tf->hob_nsect = (size / 512)>>  8;

needs whitespace before op


> +	tf->nsect = size / 512;
> +	tf->hob_lbal = 0;
> +	tf->lbal = 0;
> +	tf->hob_lbam = 0;
> +	tf->lbam = 0;
> +	tf->hob_lbah = 0;
> +	tf->lbah = 0;

taskfile is pre-zeroed for you (ata_scsi_qc_new -> ata_qc_new_init -> 
ata_qc_reinit -> ata_tf_init), so zap all those zeroing lines.


> +	tf->device = ATA_LBA;

__do not__ overwrite tf->device value.  It is already assigned a useful 
value, which you just stomped.


> +	tf->device = dev->devno ?
> +		tf->device | ATA_DEV1 : tf->device&  ~ATA_DEV1;

delete this; already done in ata_tf_init()


> +	qc->sect_size = ATA_SECT_SIZE;

delete this; already done in ata_qc_reinit()

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord - Nov. 17, 2009, 2:04 p.m.
Christoph Hellwig wrote:
> Add support for the ATA TRIM command in libata.  We translate a WRITE SAME 16
> command with the unmap bit set into an ATA TRIM command and export enough
> information in READ CAPACITY 16 and the block limits EVPD page so that the new
> SCSI layer discard support will driver this for us.
> 
> Note that I hardcode the WRITE_SAME_16 opcode for now as the patch to introduce
> the symbolic is not in 2.6.32 yet but only in the SCSI tree - as soon as it is
> merged we can fix it up to properly use the symbolic name.
> 
> Tested with a OCZ Vertex SSD and my still quite hackish TRIM emulation
> for qemu.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: linux-2.6/drivers/ata/libata-scsi.c
> ===================================================================
> --- linux-2.6.orig/drivers/ata/libata-scsi.c	2009-11-16 16:37:37.776003870 +0100
> +++ linux-2.6/drivers/ata/libata-scsi.c	2009-11-16 16:37:41.864004223 +0100
> @@ -47,6 +47,7 @@
>  #include <linux/hdreg.h>
>  #include <linux/uaccess.h>
>  #include <linux/suspend.h>
> +#include <asm/unaligned.h>
>  
>  #include "libata.h"
>  
> @@ -1964,6 +1965,7 @@ static unsigned int ata_scsiop_inq_00(st
>  		0x80,	/* page 0x80, unit serial no page */
>  		0x83,	/* page 0x83, device ident page */
>  		0x89,	/* page 0x89, ata info page */
> +		0xb0,	/* page 0xb0, block limits page */
>  		0xb1,	/* page 0xb1, block device characteristics page */
>  	};
>  
> @@ -2085,6 +2087,40 @@ static unsigned int ata_scsiop_inq_89(st
>  	return 0;
>  }
>  
> +static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
> +{
> +	u32 min_io_sectors;
> +
> +	rbuf[1] = 0xb0;
> +	rbuf[3] = 0x3c;		/* required VPD size with unmap support */
> +
> +	/*
> +	 * Optimal transfer length granularity.
> +	 *
> +	 * This is always one physical block, but for disks with a smaller
> +	 * logical than physical sector size we need to figure out what the
> +	 * latter is.
> +	 */
> +	min_io_sectors = 1;
> +	if ((args->id[106] & 0xc000) == 0x4000 && (args->id[106] & (1 << 13)))
> +		min_io_sectors *= args->id[106] & 0xf;
> +	put_unaligned_be16(min_io_sectors, &rbuf[6]);
> +
> +	/*
> +	 * Optimal unmap granularity.
> +	 *
> +	 * The ATA spec doesn't even know about a granularity or alignment
> +	 * for the TRIM command.  We can leave away most of the unmap related
> +	 * VPD page entries, but we have specifify a granularity to signal
> +	 * that we support some form of unmap - in thise case via WRITE SAME
> +	 * with the unmap bit set.
> +	 */
> +	if (ata_id_has_trim(args->id))
> +		put_unaligned_be32(1, &rbuf[28]);
> +
> +	return 0;
> +}
> +
>  static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
>  {
>  	int form_factor = ata_id_form_factor(args->id);
> @@ -2374,6 +2410,9 @@ static unsigned int ata_scsiop_read_cap(
>  		rbuf[13] = log_per_phys;
>  		rbuf[14] = (lowest_aligned >> 8) & 0x3f;
>  		rbuf[15] = lowest_aligned;
> +
> +		if (ata_id_has_trim(args->id))
> +			rbuf[14] |= 0x80;
>  	}
>  
>  	return 0;
> @@ -2896,6 +2935,70 @@ static unsigned int ata_scsi_pass_thru(s
>  	return 1;
>  }
>  
> +static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
> +{
> +	struct ata_taskfile *tf = &qc->tf;
> +	struct scsi_cmnd *scmd = qc->scsicmd;
> +	struct ata_device *dev = qc->dev;
> +	const u8 *cdb = scmd->cmnd;
> +	u64 block;
> +	u32 n_block;
> +	u32 size;
> +	void *buf;
> +
> +	/* we may not issue DMA commands if no DMA mode is set */
> +	if (unlikely(!dev->dma_mode))
> +		goto invalid_fld;
> +
> +	if (unlikely(scmd->cmd_len < 16))
> +		goto invalid_fld;
> +	scsi_16_lba_len(cdb, &block, &n_block);
> +
> +	/* for now we only support WRITE SAME with the unmap bit set */
> +	if (unlikely(!(cdb[1] & 0x8)))
> +		goto invalid_fld;
> +
> +	/*
> +	 * WRITE SAME always has a sector sized buffer as payload, this
> +	 * should never be a multiple entry S/G list.
> +	 */
> +	if (!scsi_sg_count(scmd))
> +		goto invalid_fld;
> +
> +	buf = page_address(sg_page(scsi_sglist(scmd)));
> +	size = ata_set_lba_range_entries(buf, 512 / 8, block, n_block);
> +
> +	tf->protocol = ATA_PROT_DMA;
> +	tf->hob_feature = 0;
> +	tf->feature = ATA_DSM_TRIM;
> +	tf->hob_nsect = (size / 512) >> 8;
> +	tf->nsect = size / 512;
> +	tf->hob_lbal = 0;
> +	tf->lbal = 0;
> +	tf->hob_lbam = 0;
> +	tf->lbam = 0;
> +	tf->hob_lbah = 0;
> +	tf->lbah = 0;
> +	tf->device = ATA_LBA;
> +	tf->command = ATA_CMD_DSM;
> +	tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
> +		     ATA_TFLAG_WRITE;
> +	tf->device = dev->devno ?
> +		tf->device | ATA_DEV1 : tf->device & ~ATA_DEV1;
> +
> +	qc->sect_size = ATA_SECT_SIZE;
> +	qc->flags |= ATA_QCFLAG_RESULT_TF | ATA_QCFLAG_QUIET;
> +
> +	ata_qc_set_pc_nbytes(qc);
> +
> +	return 0;
> +
> + invalid_fld:
> +	ata_scsi_set_sense(scmd, ILLEGAL_REQUEST, 0x24, 0x00);
> +	/* "Invalid field in cdb" */
> +	return 1;
> +}
> +
>  /**
>   *	ata_get_xlat_func - check if SCSI to ATA translation is possible
>   *	@dev: ATA device
> @@ -2920,6 +3023,9 @@ static inline ata_xlat_func_t ata_get_xl
>  	case WRITE_16:
>  		return ata_scsi_rw_xlat;
>  
> +	case 0x93 /*WRITE_SAME_16*/:
> +		return ata_scsi_write_same_xlat;
> +
>  	case SYNCHRONIZE_CACHE:
>  		if (ata_try_flush_cache(dev))
>  			return ata_scsi_flush_xlat;
> @@ -3109,6 +3215,9 @@ void ata_scsi_simulate(struct ata_device
>  		case 0x89:
>  			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_89);
>  			break;
> +		case 0xb0:
> +			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b0);
> +			break;
>  		case 0xb1:
>  			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b1);
>  			break;
..

Where is the code that sets up the multi-entry trim list
that this command requires as data?  It's probably there,
but I don't see it (blind or something).

And, where is the per-ATA-host flag to enable use of this feature,
with the default being DISABLED.  Or vice versa, if you want to test
all the supported controllers first for compatibility.

Eg. sata_mv, sata_sil, .. won't work as-is with this command.

Thanks

Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Nov. 17, 2009, 2:30 p.m.
On Mon, Nov 16, 2009 at 04:07:48PM +0000, Alan Cox wrote:
> > +	min_io_sectors = 1;
> > +	if ((args->id[106] & 0xc000) == 0x4000 && (args->id[106] & (1 << 13)))
> > +		min_io_sectors *= args->id[106] & 0xf;
> 
> That magic probably wants to be an ata_id_foo helper of some sort but
> otherwise all looks sensible enough for ATA.

I've added two helpers for this.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Nov. 17, 2009, 2:32 p.m.
On Mon, Nov 16, 2009 at 10:34:36PM -0500, Jeff Garzik wrote:
> args->id[] access should be via ata_id_* functions from include/linux/ata.h.
>
> Create new ata_id_* as needed.

Fixed.

>> +	tf->lbam = 0;
>> +	tf->hob_lbah = 0;
>> +	tf->lbah = 0;
>
> taskfile is pre-zeroed for you (ata_scsi_qc_new -> ata_qc_new_init ->  
> ata_qc_reinit -> ata_tf_init), so zap all those zeroing lines.

Thanks, tried to figure out if it was but after going a few levels deep
I gave up.

>> +	tf->device = ATA_LBA;
>
> __do not__ overwrite tf->device value.  It is already assigned a useful  
> value, which you just stomped.
>
>
>> +	tf->device = dev->devno ?
>> +		tf->device | ATA_DEV1 : tf->device&  ~ATA_DEV1;
>
> delete this; already done in ata_tf_init()
>
>
>> +	qc->sect_size = ATA_SECT_SIZE;
>
> delete this; already done in ata_qc_reinit()

All this is copy and paste from ata_scsi_pass_thru, but I'll happily fix
it up.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Nov. 17, 2009, 2:35 p.m.
On Tue, Nov 17, 2009 at 09:04:54AM -0500, Mark Lord wrote:
> Where is the code that sets up the multi-entry trim list
> that this command requires as data?  It's probably there,
> but I don't see it (blind or something).

It's not there, as the SCSI command we're translating only supports a
single range.  We can later also add UNMAP support, but it's a lot more
complicated to translate due to the different data formats.  My plan
for now is to get basic discard support done in all layers and then
later optimize it when all infrastructure is in place.

> And, where is the per-ATA-host flag to enable use of this feature,
> with the default being DISABLED.  Or vice versa, if you want to test
> all the supported controllers first for compatibility.
>
> Eg. sata_mv, sata_sil, .. won't work as-is with this command.

I can certainly add a blacklist, do you have a complete list of
controllers that should be blacklisted.  Also is there a good reason
why they are blacklisted or ist just controller bugs?

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Nov. 17, 2009, 2:36 p.m.
On Mon, Nov 16, 2009 at 10:34:36PM -0500, Jeff Garzik wrote:
>
>
> Overall, OK.
>
> Needs minor revisions, including one bug fix (tf->device stomping).
>
>
>> +static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
>> +{
>> +	u32 min_io_sectors;
>> +
>> +	rbuf[1] = 0xb0;
>> +	rbuf[3] = 0x3c;		/* required VPD size with unmap support */
>> +
>> +	/*
>> +	 * Optimal transfer length granularity.
>> +	 *
>> +	 * This is always one physical block, but for disks with a smaller
>> +	 * logical than physical sector size we need to figure out what the
>> +	 * latter is.
>> +	 */
>> +	min_io_sectors = 1;
>> +	if ((args->id[106]&  0xc000) == 0x4000&&  (args->id[106]&  (1<<  13)))
>> +		min_io_sectors *= args->id[106]&  0xf;
>
> args->id[] access should be via ata_id_* functions from include/linux/ata.h.
>
> Create new ata_id_* as needed.
>
> Plus...  add some whitespace before all C operators, to be consistent  
> with the rest of libata.
>
>
>> +	buf = page_address(sg_page(scsi_sglist(scmd)));
>> +	size = ata_set_lba_range_entries(buf, 512 / 8, block, n_block);
>> +
>> +	tf->protocol = ATA_PROT_DMA;
>> +	tf->hob_feature = 0;
>> +	tf->feature = ATA_DSM_TRIM;
>> +	tf->hob_nsect = (size / 512)>>  8;
>
> needs whitespace before op

I wonder where this got corruped, my local copy doesn't have either
whitespace issue.  Let's hope mutt hasn't started to corrupt patches
lately.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Nov. 17, 2009, 2:52 p.m.
> > Eg. sata_mv, sata_sil, .. won't work as-is with this command.
> 
> I can certainly add a blacklist, do you have a complete list of
> controllers that should be blacklisted.  Also is there a good reason
> why they are blacklisted or ist just controller bugs?

I'm not convinced that is the right way to handle it. If we start doing
that we'll have this massive unmanagable set of magic flags and twiddles
for every possible host specific corner case.

Far better to keep the issue of the command as Christoph has done and
block the command issue in the afflicted devices qc_issue so that you get
an error back. That keeps the mess in the right places.

Christoph - the problem with some overly bright and IMHO misdesigned
controllers is that they've got magic tables for command lengths in them
to drive their ATA state machines. Those controllers tend to need special
handlers when new commands come along to explain to them in simple terms
what sort of command it is and wipe their backside afterwards.

That belongs in the crappy hardware drivers not in the core code.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Nov. 17, 2009, 3 p.m.
On Tue, Nov 17, 2009 at 02:52:39PM +0000, Alan Cox wrote:
> I'm not convinced that is the right way to handle it. If we start doing
> that we'll have this massive unmanagable set of magic flags and twiddles
> for every possible host specific corner case.
> 
> Far better to keep the issue of the command as Christoph has done and
> block the command issue in the afflicted devices qc_issue so that you get
> an error back. That keeps the mess in the right places.

Ok, new version of the patch without this will go out this minute.  I'm
happy to add whatever patch is nessecary to block it for controllers
that can't accept it once we agree on a method.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Nov. 17, 2009, 9:59 p.m.
On 11/17/2009 09:32 AM, Christoph Hellwig wrote:
> On Mon, Nov 16, 2009 at 10:34:36PM -0500, Jeff Garzik wrote:
>> args->id[] access should be via ata_id_* functions from include/linux/ata.h.
>>
>> Create new ata_id_* as needed.
>
> Fixed.
>
>>> +	tf->lbam = 0;
>>> +	tf->hob_lbah = 0;
>>> +	tf->lbah = 0;
>>
>> taskfile is pre-zeroed for you (ata_scsi_qc_new ->  ata_qc_new_init ->
>> ata_qc_reinit ->  ata_tf_init), so zap all those zeroing lines.
>
> Thanks, tried to figure out if it was but after going a few levels deep
> I gave up.

Understandable ;-)


>>> +	tf->device = ATA_LBA;
>>
>> __do not__ overwrite tf->device value.  It is already assigned a useful
>> value, which you just stomped.
>>
>>
>>> +	tf->device = dev->devno ?
>>> +		tf->device | ATA_DEV1 : tf->device&   ~ATA_DEV1;
>>
>> delete this; already done in ata_tf_init()
>>
>>
>>> +	qc->sect_size = ATA_SECT_SIZE;
>>
>> delete this; already done in ata_qc_reinit()
>
> All this is copy and paste from ata_scsi_pass_thru, but I'll happily fix
> it up.

Thanks.

ata_scsi_pass_thru() isn't the best model, since it assumes some 
registers -- notably Device -- are accurately filled in by userland. 
That is the purpose of the pass-through "command," after all.

Pretty much all other sources use the tf->device provided by 
ata_tf_init(), OR'd as appropriate with additional bits.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord - Nov. 19, 2009, 3:35 a.m.
Christoph Hellwig wrote:
> On Tue, Nov 17, 2009 at 02:52:39PM +0000, Alan Cox wrote:
>> I'm not convinced that is the right way to handle it. If we start doing
>> that we'll have this massive unmanagable set of magic flags and twiddles
>> for every possible host specific corner case.
>>
>> Far better to keep the issue of the command as Christoph has done and
>> block the command issue in the afflicted devices qc_issue so that you get
>> an error back. That keeps the mess in the right places.
> 
> Ok, new version of the patch without this will go out this minute.  I'm
> happy to add whatever patch is nessecary to block it for controllers
> that can't accept it once we agree on a method.
..

I imagine some sort of port or host flag, that the LLD could set
to indicate that it can correctly handle TRIM.  Or not handle TRIM.

Tejun is pretty good at these flag things.. I wonder what he might suggest?

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Nov. 19, 2009, 10:23 a.m.
> I imagine some sort of port or host flag, that the LLD could set
> to indicate that it can correctly handle TRIM.  Or not handle TRIM.
> 
> Tejun is pretty good at these flag things.. I wonder what he might suggest?

As I said - just error it in your driver qc_issue function. If you are
feeling really arty you can hook do_identify and mask the bit in the
ident data as well. See it821x 'smart' RAID mode for both of those in
pata_it821x.c

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord - Nov. 19, 2009, 2:22 p.m.
Alan Cox wrote:
>> I imagine some sort of port or host flag, that the LLD could set
>> to indicate that it can correctly handle TRIM.  Or not handle TRIM.
>>
>> Tejun is pretty good at these flag things.. I wonder what he might suggest?
> 
> As I said - just error it in your driver qc_issue function. If you are
> feeling really arty you can hook do_identify and mask the bit in the
> ident data as well. See it821x 'smart' RAID mode for both of those in
> pata_it821x.c
..

I don't know.. then we have lots of copies of the code,
rather than a single "ATA library" copy of it.

Isn't the idea of libata to provide helper functions and
central feature flagging, to simplify the chipset drivers?

Either way works, but right now we don't actually know all of the
chipsets that can handle this feature, and all of those that cannot.

Using a simple host_flag to disable it (or enable it, either way),
will make it trivial to patch them out/in as they are discovered.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Nov. 20, 2009, 12:46 p.m.
> > For those that pee themselves you don't need a host flag, you need a
> > qc_issue hook - you need that hook for the purpose of actually making the
> > command work, so in the short term making it just return AC_ERR_DEV isn't
> > exactly hard. Not having the flag and fixing those odd few drivers would
> > be far more productive.
> 
> We are talking about chips whose command set is essentially fixed at 
> manufacture time.

There are only two of those I know of
- IT821x which already uses qc_issue to filter.
- pata_st412, which uses qc_issue and isn't in tree anyway and probably
  never will be.

The SIL ones jut need babysitting. The older Promise at least only snoop
SET_FEATURES and the transfer lengths for some DMA ops. We disable the
features snooping (its there to make driverless windows stuff work well I
assume), and we babysit the transfer lengths for the chip as per the
documentation for issuing LBA48 etc.

> So the alternative is to create a table of known-working commands in 
> each driver, and fail all commands outside that table.  In the case of

For most drivers all commands just work. There are very few with command
tables. LBA48 confused some not because of command tables but because the
double load of the taskfile was a sequence they couldn't hack or because
they snooped the transfer length. The latter is kept correctly by the new
commands.

> SiI and Promise at least, I think that command table will be the same 
> (ATA-7 commands).

It's certainly not quite that simple. A table wouldn't be too bad for
just kernel issued commands as we don't issue very many but for the
general case it would not be pretty at all. The command list for SII is
btw not ATA-7 its a bit more complicated, which is another reason not to
do a command filter but to worry about feature sets we care about.

> At that point, whether it's a common function (is_ata7_command) or a 
> command flag (ATA_HFLAG_NO_NEW_COMMANDS) is largely a point of taste.

SII controllers support a subset of commands based on legacy (WD1010),
ATA and bits of CF. However they also support things like a subset of
smart commands and some extensions. That means a command table isn't
sufficient for the general case, you actually have to parse the features
field if you want to do every case right (ick ick ick)

You don't actually want to do that filtering on SII controllers
anyway because their default behaviour is to abort an unsupported command.
So it all "just works" because the stack sees an abort and assumes the
drive replied with command not supported status (see 10.3.4.2). The chip
designers got that one right.

The SII anyhow supports telling the controller the command behaviour
required, which means that a filter isn't what you need anyway. You need
to set up the command table to allow TRIM through. That means we don't
need a filter, the driver just needs enhancing to fix that quirk.

So who will actually use this flag ? Which other drivers have problems
in this area ? Rather than blindly implement a flag we need to work
through those that *actually* have a problem.

As far as I can see the TRIM stuff will just work as it is right now. The
problem controllers like the IT821x will return AC_ERR_DEV on qc_issue and
the SII controllers will fake a drive abort for command unknown. The
older Promise at least will handle the command.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord - Nov. 21, 2009, 4:33 a.m.
Alan Cox wrote:
>>> For those that pee themselves you don't need a host flag, you need a
>>> qc_issue hook - you need that hook for the purpose of actually making the
>>> command work, so in the short term making it just return AC_ERR_DEV isn't
>>> exactly hard. Not having the flag and fixing those odd few drivers would
>>> be far more productive.
>> We are talking about chips whose command set is essentially fixed at 
>> manufacture time.
> 
> There are only two of those I know of
> - IT821x which already uses qc_issue to filter.
> - pata_st412, which uses qc_issue and isn't in tree anyway and probably
>   never will be.
..

The sata_mv driver will choke the chipset if the TRIM command
involves more than one sector of TRIM data.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen - Nov. 21, 2009, 6:09 a.m.
Mark> The sata_mv driver will choke the chipset if the TRIM command
Mark> involves more than one sector of TRIM data.

We never issue more than a single sector.
Mark Lord - Nov. 22, 2009, 2:39 a.m.
Martin K. Petersen wrote:
> Mark> The sata_mv driver will choke the chipset if the TRIM command
> Mark> involves more than one sector of TRIM data.
> 
> We never issue more than a single sector.
..

Today.

:)

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: linux-2.6/drivers/ata/libata-scsi.c
===================================================================
--- linux-2.6.orig/drivers/ata/libata-scsi.c	2009-11-16 16:37:37.776003870 +0100
+++ linux-2.6/drivers/ata/libata-scsi.c	2009-11-16 16:37:41.864004223 +0100
@@ -47,6 +47,7 @@ 
 #include <linux/hdreg.h>
 #include <linux/uaccess.h>
 #include <linux/suspend.h>
+#include <asm/unaligned.h>
 
 #include "libata.h"
 
@@ -1964,6 +1965,7 @@  static unsigned int ata_scsiop_inq_00(st
 		0x80,	/* page 0x80, unit serial no page */
 		0x83,	/* page 0x83, device ident page */
 		0x89,	/* page 0x89, ata info page */
+		0xb0,	/* page 0xb0, block limits page */
 		0xb1,	/* page 0xb1, block device characteristics page */
 	};
 
@@ -2085,6 +2087,40 @@  static unsigned int ata_scsiop_inq_89(st
 	return 0;
 }
 
+static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
+{
+	u32 min_io_sectors;
+
+	rbuf[1] = 0xb0;
+	rbuf[3] = 0x3c;		/* required VPD size with unmap support */
+
+	/*
+	 * Optimal transfer length granularity.
+	 *
+	 * This is always one physical block, but for disks with a smaller
+	 * logical than physical sector size we need to figure out what the
+	 * latter is.
+	 */
+	min_io_sectors = 1;
+	if ((args->id[106] & 0xc000) == 0x4000 && (args->id[106] & (1 << 13)))
+		min_io_sectors *= args->id[106] & 0xf;
+	put_unaligned_be16(min_io_sectors, &rbuf[6]);
+
+	/*
+	 * Optimal unmap granularity.
+	 *
+	 * The ATA spec doesn't even know about a granularity or alignment
+	 * for the TRIM command.  We can leave away most of the unmap related
+	 * VPD page entries, but we have specifify a granularity to signal
+	 * that we support some form of unmap - in thise case via WRITE SAME
+	 * with the unmap bit set.
+	 */
+	if (ata_id_has_trim(args->id))
+		put_unaligned_be32(1, &rbuf[28]);
+
+	return 0;
+}
+
 static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
 {
 	int form_factor = ata_id_form_factor(args->id);
@@ -2374,6 +2410,9 @@  static unsigned int ata_scsiop_read_cap(
 		rbuf[13] = log_per_phys;
 		rbuf[14] = (lowest_aligned >> 8) & 0x3f;
 		rbuf[15] = lowest_aligned;
+
+		if (ata_id_has_trim(args->id))
+			rbuf[14] |= 0x80;
 	}
 
 	return 0;
@@ -2896,6 +2935,70 @@  static unsigned int ata_scsi_pass_thru(s
 	return 1;
 }
 
+static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
+{
+	struct ata_taskfile *tf = &qc->tf;
+	struct scsi_cmnd *scmd = qc->scsicmd;
+	struct ata_device *dev = qc->dev;
+	const u8 *cdb = scmd->cmnd;
+	u64 block;
+	u32 n_block;
+	u32 size;
+	void *buf;
+
+	/* we may not issue DMA commands if no DMA mode is set */
+	if (unlikely(!dev->dma_mode))
+		goto invalid_fld;
+
+	if (unlikely(scmd->cmd_len < 16))
+		goto invalid_fld;
+	scsi_16_lba_len(cdb, &block, &n_block);
+
+	/* for now we only support WRITE SAME with the unmap bit set */
+	if (unlikely(!(cdb[1] & 0x8)))
+		goto invalid_fld;
+
+	/*
+	 * WRITE SAME always has a sector sized buffer as payload, this
+	 * should never be a multiple entry S/G list.
+	 */
+	if (!scsi_sg_count(scmd))
+		goto invalid_fld;
+
+	buf = page_address(sg_page(scsi_sglist(scmd)));
+	size = ata_set_lba_range_entries(buf, 512 / 8, block, n_block);
+
+	tf->protocol = ATA_PROT_DMA;
+	tf->hob_feature = 0;
+	tf->feature = ATA_DSM_TRIM;
+	tf->hob_nsect = (size / 512) >> 8;
+	tf->nsect = size / 512;
+	tf->hob_lbal = 0;
+	tf->lbal = 0;
+	tf->hob_lbam = 0;
+	tf->lbam = 0;
+	tf->hob_lbah = 0;
+	tf->lbah = 0;
+	tf->device = ATA_LBA;
+	tf->command = ATA_CMD_DSM;
+	tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
+		     ATA_TFLAG_WRITE;
+	tf->device = dev->devno ?
+		tf->device | ATA_DEV1 : tf->device & ~ATA_DEV1;
+
+	qc->sect_size = ATA_SECT_SIZE;
+	qc->flags |= ATA_QCFLAG_RESULT_TF | ATA_QCFLAG_QUIET;
+
+	ata_qc_set_pc_nbytes(qc);
+
+	return 0;
+
+ invalid_fld:
+	ata_scsi_set_sense(scmd, ILLEGAL_REQUEST, 0x24, 0x00);
+	/* "Invalid field in cdb" */
+	return 1;
+}
+
 /**
  *	ata_get_xlat_func - check if SCSI to ATA translation is possible
  *	@dev: ATA device
@@ -2920,6 +3023,9 @@  static inline ata_xlat_func_t ata_get_xl
 	case WRITE_16:
 		return ata_scsi_rw_xlat;
 
+	case 0x93 /*WRITE_SAME_16*/:
+		return ata_scsi_write_same_xlat;
+
 	case SYNCHRONIZE_CACHE:
 		if (ata_try_flush_cache(dev))
 			return ata_scsi_flush_xlat;
@@ -3109,6 +3215,9 @@  void ata_scsi_simulate(struct ata_device
 		case 0x89:
 			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_89);
 			break;
+		case 0xb0:
+			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b0);
+			break;
 		case 0xb1:
 			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b1);
 			break;