diff mbox

[dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]

Message ID yq1wri6zel9.fsf@sermon.lab.mkp.net
State Not Applicable, archived
Headers show

Commit Message

Martin K. Petersen May 4, 2011, 3:10 p.m. UTC
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:

Lukas> Nevertheless there is something weird going on, because even when
Lukas> I create striped volume I get this:

Could you please try the following patch? It has a bunch of small tweaks
to the discard stack in it. I'll split it up before posting for real but
I'd like to know if it fixes your issue...


block/libata/scsi: Various logical block provisioning fixes

 - Add sysfs documentation for the discard topology parameters

 - Fix discard stacking problem

 - Switch our libata SAT over to using the WRITE SAME limits

 - UNMAP alignment needs to be converted to bytes

 - Only report alignment and zeroes_data if the device supports discard

Reported-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mike Snitzer May 4, 2011, 4:02 p.m. UTC | #1
On Wed, May 04 2011 at 11:10am -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> 
> Lukas> Nevertheless there is something weird going on, because even when
> Lukas> I create striped volume I get this:
> 
> Could you please try the following patch? It has a bunch of small tweaks
> to the discard stack in it. I'll split it up before posting for real but
> I'd like to know if it fixes your issue...
> 
> 
> block/libata/scsi: Various logical block provisioning fixes
> 
>  - Add sysfs documentation for the discard topology parameters
> 
>  - Fix discard stacking problem
> 
>  - Switch our libata SAT over to using the WRITE SAME limits
> 
>  - UNMAP alignment needs to be converted to bytes
> 
>  - Only report alignment and zeroes_data if the device supports discard
> 
> Reported-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1fa7692..42d3bf5 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->discard_granularity = 0;
>  	lim->discard_alignment = 0;
>  	lim->discard_misaligned = 0;
> -	lim->discard_zeroes_data = -1;
> +	lim->discard_zeroes_data = 1;
>  	lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
>  	lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
>  	lim->alignment_offset = 0;

lim->discard_zeroes_data = -1; was suspect to me too.
But why default to 1 here?

> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
>  
>  	blk_set_default_limits(&q->limits);
>  	blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> +	q->limits.discard_zeroes_data = 0;
>  
>  	/*
>  	 * by default assume old behaviour and bounce for any highmem page

Only to then reset to 0 here?  Shouldn't we default to 0 and only set to
1 where applicable (e.g. sd_config_discard)?
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen May 4, 2011, 4:50 p.m. UTC | #2
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> lim->discard_zeroes_data = -1; was suspect to me too.
Mike> But why default to 1 here?

Because otherwise DM would default to having dzd to "unsupported",
meaning the feature would never be turned on regardless of the bottom
device capabilities.

The old approach used the -1 value to indicate "has not been set". That
was only really intended as a value for the stacking drivers, not for
the LLDs. It was a bit of a hack and I'd rather deal with dzd the same
way as we do with clustering.


>> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue
>> *q, make_request_fn *mfn)
>> 
>> blk_set_default_limits(&q->limits);
>> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
>> + q->limits.discard_zeroes_data = 0;
>> 
>> /*
>> * by default assume old behaviour and bounce for any highmem page

Mike> Only to then reset to 0 here?  Shouldn't we default to 0 and only
Mike> set to 1 where applicable (e.g. sd_config_discard)?

My first approach was to set it in dm-table.c before stacking. But I
thought it was icky to have the stacking driver ask for defaults and
then have to tweak them for things to work correctly.

The other option is to have blk_set_default_stacking_limits(). Or we
could add a flag to blk_set_default_limits to indicate whether this is a
LLD or a stacking driver.

We already special-case BLK_SAFE_MAX_SECTORS when setting the request
function. And that's the only non-stacking user of the default limits
call. So that's why I disabled dzd there. Since this is a stable bugfix
I also wanted to keep it small and simple. But I'm totally open to
suggestions.
Lukas Czerner May 4, 2011, 5:10 p.m. UTC | #3
On Wed, 4 May 2011, Martin K. Petersen wrote:

> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> 
> Lukas> Nevertheless there is something weird going on, because even when
> Lukas> I create striped volume I get this:
> 
> Could you please try the following patch? It has a bunch of small tweaks
> to the discard stack in it. I'll split it up before posting for real but
> I'd like to know if it fixes your issue...

It works thanks! This is before the patch applied:

NAME                       DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sdb                               0        0B       0B         0
├─sdb1                   4293918720        0B       0B         0
├─sdb2                   4293918720        0B       0B         0
└─sdb3                   4293918720        0B       0B         0
  └─vg_test-lvol0 (dm-0)          0      512B      16E         1
sdd                               0      512B      16E         1
└─sdd1                            0      512B      16E         1
  └─vg_test-lvol0 (dm-0)          0      512B      16E         1

and this is with the patch:

NAME                     DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sdb                             0        0B       0B         0
├─sdb1                          0        0B       0B         0
├─sdb2                          0        0B       0B         0
└─sdb3                          0        0B       0B         0
  └─vg_test-lvol0 (dm-0)        0      512B       2G         0
sdd                             0      512B       2G         1
└─sdd1                          0      512B       2G         1
  └─vg_test-lvol0 (dm-0)        0      512B       2G         0

It also works properly with two devices supporting discard.

NAME                     DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sdd                             0      512B       2G         1
├─sdd1                          0      512B       2G         1
│ └─vg_test-lvol0 (dm-0)        0      512B       2G         1
└─sdd2                          0      512B       2G         1
  └─vg_test-lvol0 (dm-0)        0      512B       2G         1


Also I am not sure why this changed discard_max_bytes from 4294966784
to 2147450880 in my case, that does not seem right...lsblk in the first
place apparently overflowed.

Thanks for solving this!
-Lukas


> 
> 
> block/libata/scsi: Various logical block provisioning fixes
> 
>  - Add sysfs documentation for the discard topology parameters
> 
>  - Fix discard stacking problem
> 
>  - Switch our libata SAT over to using the WRITE SAME limits
> 
>  - UNMAP alignment needs to be converted to bytes
> 
>  - Only report alignment and zeroes_data if the device supports discard
> 
> Reported-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 4873c75..c1eb41c 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -142,3 +142,67 @@ Description:
>  		with the previous I/O request are enabled. When set to 2,
>  		all merge tries are disabled. The default value is 0 -
>  		which enables all types of merge tries.
> +
> +What:		/sys/block/<disk>/discard_alignment
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may
> +		internally allocate space in units that are bigger than
> +		the exported logical block size. The discard_alignment
> +		parameter indicates how many bytes the beginning of the
> +		device is offset from the internal allocation unit's
> +		natural alignment.
> +
> +What:		/sys/block/<disk>/<partition>/discard_alignment
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may
> +		internally allocate space in units that are bigger than
> +		the exported logical block size. The discard_alignment
> +		parameter indicates how many bytes the beginning of the
> +		partition is offset from the internal allocation unit's
> +		natural alignment.
> +
> +What:		/sys/block/<disk>/queue/discard_granularity
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may
> +		internally allocate space using units that are bigger
> +		than the logical block size. The discard_granularity
> +		parameter indicates the size of the internal allocation
> +		unit in bytes if reported by the device. Otherwise the
> +		discard_granularity will be set to match the device's
> +		physical block size. A discard_granularity of 0 means
> +		that the device does not support discard functionality.
> +
> +What:		/sys/block/<disk>/queue/discard_max_bytes
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may have
> +		internal limits on the number of bytes that can be
> +		trimmed or unmapped in a single operation. Some storage
> +		protocols also have inherent limits on the number of
> +		blocks that can be described in a single command. The
> +		discard_max_bytes parameter is set by the device driver
> +		to the maximum number of bytes that can be discarded in
> +		a single operation. Discard requests issued to the
> +		device must not exceed this limit. A discard_max_bytes
> +		value of 0 means that the device does not support
> +		discard functionality.
> +
> +What:		/sys/block/<disk>/queue/discard_zeroes_data
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may return
> +		stale or random data when a previously discarded block
> +		is read back. This can cause problems if the filesystem
> +		expects discarded blocks to be explicitly cleared. If a
> +		device reports that it deterministically returns zeroes
> +		when a discarded area is read the discard_zeroes_data
> +		parameter will be set to one. Otherwise it will be 0 and
> +		the result of reading a discarded area is undefined.
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1fa7692..42d3bf5 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->discard_granularity = 0;
>  	lim->discard_alignment = 0;
>  	lim->discard_misaligned = 0;
> -	lim->discard_zeroes_data = -1;
> +	lim->discard_zeroes_data = 1;
>  	lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
>  	lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
>  	lim->alignment_offset = 0;
> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
>  
>  	blk_set_default_limits(&q->limits);
>  	blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> +	q->limits.discard_zeroes_data = 0;
>  
>  	/*
>  	 * by default assume old behaviour and bounce for any highmem page
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e2f57e9e..30ea95f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
>  	 * with the unmap bit set.
>  	 */
>  	if (ata_id_has_trim(args->id)) {
> -		put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
> +		put_unaligned_be64(65535 * 512 / 8, &rbuf[36]);
>  		put_unaligned_be32(1, &rbuf[28]);
>  	}
>  
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index bd0806e..cc081dd 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
>  	unsigned int max_blocks = 0;
>  
>  	q->limits.discard_zeroes_data = sdkp->lbprz;
> -	q->limits.discard_alignment = sdkp->unmap_alignment;
> +	q->limits.discard_alignment = sdkp->unmap_alignment *
> +		(logical_block_size >> 9);
>  	q->limits.discard_granularity =
>  		max(sdkp->physical_block_size,
>  		    sdkp->unmap_granularity * logical_block_size);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2ad95fa..1416e3c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -257,7 +257,7 @@ struct queue_limits {
>  	unsigned char		misaligned;
>  	unsigned char		discard_misaligned;
>  	unsigned char		cluster;
> -	signed char		discard_zeroes_data;
> +	unsigned char		discard_zeroes_data;
>  };
>  
>  struct request_queue
> @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector
>  {
>  	unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1);
>  
> +	if (!lim->max_discard_sectors)
> +		return 0;
> +
>  	return (lim->discard_granularity + lim->discard_alignment - alignment)
>  		& (lim->discard_granularity - 1);
>  }
>  
>  static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
>  {
> -	if (q->limits.discard_zeroes_data == 1)
> +	if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1)
>  		return 1;
>  
>  	return 0;
> 

--
Martin K. Petersen May 4, 2011, 5:32 p.m. UTC | #4
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:

Lukas> Also I am not sure why this changed discard_max_bytes from
Lukas> 4294966784 to 2147450880 in my case, that does not seem
Lukas> right...lsblk in the first place apparently overflowed.

2G is correct. The relevant libata patch was missing from the series I
submitted during the merge window and I don't see it in the archives. It
was at the end of the stack so I guess I had an off-by-one in my commit
range...
Lukas Czerner May 4, 2011, 5:35 p.m. UTC | #5
On Wed, 4 May 2011, Martin K. Petersen wrote:

> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> 
> Lukas> Also I am not sure why this changed discard_max_bytes from
> Lukas> 4294966784 to 2147450880 in my case, that does not seem
> Lukas> right...lsblk in the first place apparently overflowed.
> 
> 2G is correct. The relevant libata patch was missing from the series I
> submitted during the merge window and I don't see it in the archives. It
> was at the end of the stack so I guess I had an off-by-one in my commit
> range...
> 

Ok, thanks.

-Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer May 4, 2011, 6:03 p.m. UTC | #6
On Wed, May 04 2011 at 12:50pm -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> lim->discard_zeroes_data = -1; was suspect to me too.
> Mike> But why default to 1 here?
> 
> Because otherwise DM would default to having dzd to "unsupported",
> meaning the feature would never be turned on regardless of the bottom
> device capabilities.
> 
> The old approach used the -1 value to indicate "has not been set". That
> was only really intended as a value for the stacking drivers, not for
> the LLDs. It was a bit of a hack and I'd rather deal with dzd the same
> way as we do with clustering.
> 
> 
> >> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue
> >> *q, make_request_fn *mfn)
> >> 
> >> blk_set_default_limits(&q->limits);
> >> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> >> + q->limits.discard_zeroes_data = 0;
> >> 
> >> /*
> >> * by default assume old behaviour and bounce for any highmem page
> 
> Mike> Only to then reset to 0 here?  Shouldn't we default to 0 and only
> Mike> set to 1 where applicable (e.g. sd_config_discard)?
> 
> My first approach was to set it in dm-table.c before stacking. But I
> thought it was icky to have the stacking driver ask for defaults and
> then have to tweak them for things to work correctly.
> 
> The other option is to have blk_set_default_stacking_limits(). Or we
> could add a flag to blk_set_default_limits to indicate whether this is a
> LLD or a stacking driver.
> 
> We already special-case BLK_SAFE_MAX_SECTORS when setting the request
> function. And that's the only non-stacking user of the default limits
> call. So that's why I disabled dzd there. Since this is a stable bugfix
> I also wanted to keep it small and simple. But I'm totally open to
> suggestions.

Your current approach sounds good.  Might be good to briefly speak to
the duality of the stacking vs non-stacking approach in the associated
patch header.

Thanks for clarifying.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer May 18, 2011, 12:16 p.m. UTC | #7
Hey Martin,

On Wed, May 04 2011 at 11:10am -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> 
> Lukas> Nevertheless there is something weird going on, because even when
> Lukas> I create striped volume I get this:
> 
> Could you please try the following patch? It has a bunch of small tweaks
> to the discard stack in it. I'll split it up before posting for real but
> I'd like to know if it fixes your issue...

Would be ideal to get these fixes staged for 2.6.40.

Do you intend to split these patches up and post for 2.6.40 inclussion?

Please advise, thanks!


> block/libata/scsi: Various logical block provisioning fixes
> 
>  - Add sysfs documentation for the discard topology parameters
> 
>  - Fix discard stacking problem
> 
>  - Switch our libata SAT over to using the WRITE SAME limits
> 
>  - UNMAP alignment needs to be converted to bytes
> 
>  - Only report alignment and zeroes_data if the device supports discard
> 
> Reported-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 4873c75..c1eb41c 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -142,3 +142,67 @@ Description:
>  		with the previous I/O request are enabled. When set to 2,
>  		all merge tries are disabled. The default value is 0 -
>  		which enables all types of merge tries.
> +
> +What:		/sys/block/<disk>/discard_alignment
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may
> +		internally allocate space in units that are bigger than
> +		the exported logical block size. The discard_alignment
> +		parameter indicates how many bytes the beginning of the
> +		device is offset from the internal allocation unit's
> +		natural alignment.
> +
> +What:		/sys/block/<disk>/<partition>/discard_alignment
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may
> +		internally allocate space in units that are bigger than
> +		the exported logical block size. The discard_alignment
> +		parameter indicates how many bytes the beginning of the
> +		partition is offset from the internal allocation unit's
> +		natural alignment.
> +
> +What:		/sys/block/<disk>/queue/discard_granularity
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may
> +		internally allocate space using units that are bigger
> +		than the logical block size. The discard_granularity
> +		parameter indicates the size of the internal allocation
> +		unit in bytes if reported by the device. Otherwise the
> +		discard_granularity will be set to match the device's
> +		physical block size. A discard_granularity of 0 means
> +		that the device does not support discard functionality.
> +
> +What:		/sys/block/<disk>/queue/discard_max_bytes
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may have
> +		internal limits on the number of bytes that can be
> +		trimmed or unmapped in a single operation. Some storage
> +		protocols also have inherent limits on the number of
> +		blocks that can be described in a single command. The
> +		discard_max_bytes parameter is set by the device driver
> +		to the maximum number of bytes that can be discarded in
> +		a single operation. Discard requests issued to the
> +		device must not exceed this limit. A discard_max_bytes
> +		value of 0 means that the device does not support
> +		discard functionality.
> +
> +What:		/sys/block/<disk>/queue/discard_zeroes_data
> +Date:		May 2011
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Devices that support discard functionality may return
> +		stale or random data when a previously discarded block
> +		is read back. This can cause problems if the filesystem
> +		expects discarded blocks to be explicitly cleared. If a
> +		device reports that it deterministically returns zeroes
> +		when a discarded area is read the discard_zeroes_data
> +		parameter will be set to one. Otherwise it will be 0 and
> +		the result of reading a discarded area is undefined.
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1fa7692..42d3bf5 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->discard_granularity = 0;
>  	lim->discard_alignment = 0;
>  	lim->discard_misaligned = 0;
> -	lim->discard_zeroes_data = -1;
> +	lim->discard_zeroes_data = 1;
>  	lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
>  	lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
>  	lim->alignment_offset = 0;
> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
>  
>  	blk_set_default_limits(&q->limits);
>  	blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> +	q->limits.discard_zeroes_data = 0;
>  
>  	/*
>  	 * by default assume old behaviour and bounce for any highmem page
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e2f57e9e..30ea95f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
>  	 * with the unmap bit set.
>  	 */
>  	if (ata_id_has_trim(args->id)) {
> -		put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
> +		put_unaligned_be64(65535 * 512 / 8, &rbuf[36]);
>  		put_unaligned_be32(1, &rbuf[28]);
>  	}
>  
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index bd0806e..cc081dd 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
>  	unsigned int max_blocks = 0;
>  
>  	q->limits.discard_zeroes_data = sdkp->lbprz;
> -	q->limits.discard_alignment = sdkp->unmap_alignment;
> +	q->limits.discard_alignment = sdkp->unmap_alignment *
> +		(logical_block_size >> 9);
>  	q->limits.discard_granularity =
>  		max(sdkp->physical_block_size,
>  		    sdkp->unmap_granularity * logical_block_size);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2ad95fa..1416e3c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -257,7 +257,7 @@ struct queue_limits {
>  	unsigned char		misaligned;
>  	unsigned char		discard_misaligned;
>  	unsigned char		cluster;
> -	signed char		discard_zeroes_data;
> +	unsigned char		discard_zeroes_data;
>  };
>  
>  struct request_queue
> @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector
>  {
>  	unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1);
>  
> +	if (!lim->max_discard_sectors)
> +		return 0;
> +
>  	return (lim->discard_granularity + lim->discard_alignment - alignment)
>  		& (lim->discard_granularity - 1);
>  }
>  
>  static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
>  {
> -	if (q->limits.discard_zeroes_data == 1)
> +	if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1)
>  		return 1;
>  
>  	return 0;
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer May 18, 2011, 12:52 p.m. UTC | #8
On Wed, May 18 2011 at  8:16am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> Hey Martin,
> 
> On Wed, May 04 2011 at 11:10am -0400,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
> > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> > 
> > Lukas> Nevertheless there is something weird going on, because even when
> > Lukas> I create striped volume I get this:
> > 
> > Could you please try the following patch? It has a bunch of small tweaks
> > to the discard stack in it. I'll split it up before posting for real but
> > I'd like to know if it fixes your issue...
> 
> Would be ideal to get these fixes staged for 2.6.40.
> 
> Do you intend to split these patches up and post for 2.6.40 inclussion?

Ah, I now see you posted them and Jens picked some up.. thanks

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 4873c75..c1eb41c 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -142,3 +142,67 @@  Description:
 		with the previous I/O request are enabled. When set to 2,
 		all merge tries are disabled. The default value is 0 -
 		which enables all types of merge tries.
+
+What:		/sys/block/<disk>/discard_alignment
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Devices that support discard functionality may
+		internally allocate space in units that are bigger than
+		the exported logical block size. The discard_alignment
+		parameter indicates how many bytes the beginning of the
+		device is offset from the internal allocation unit's
+		natural alignment.
+
+What:		/sys/block/<disk>/<partition>/discard_alignment
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Devices that support discard functionality may
+		internally allocate space in units that are bigger than
+		the exported logical block size. The discard_alignment
+		parameter indicates how many bytes the beginning of the
+		partition is offset from the internal allocation unit's
+		natural alignment.
+
+What:		/sys/block/<disk>/queue/discard_granularity
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Devices that support discard functionality may
+		internally allocate space using units that are bigger
+		than the logical block size. The discard_granularity
+		parameter indicates the size of the internal allocation
+		unit in bytes if reported by the device. Otherwise the
+		discard_granularity will be set to match the device's
+		physical block size. A discard_granularity of 0 means
+		that the device does not support discard functionality.
+
+What:		/sys/block/<disk>/queue/discard_max_bytes
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Devices that support discard functionality may have
+		internal limits on the number of bytes that can be
+		trimmed or unmapped in a single operation. Some storage
+		protocols also have inherent limits on the number of
+		blocks that can be described in a single command. The
+		discard_max_bytes parameter is set by the device driver
+		to the maximum number of bytes that can be discarded in
+		a single operation. Discard requests issued to the
+		device must not exceed this limit. A discard_max_bytes
+		value of 0 means that the device does not support
+		discard functionality.
+
+What:		/sys/block/<disk>/queue/discard_zeroes_data
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Devices that support discard functionality may return
+		stale or random data when a previously discarded block
+		is read back. This can cause problems if the filesystem
+		expects discarded blocks to be explicitly cleared. If a
+		device reports that it deterministically returns zeroes
+		when a discarded area is read the discard_zeroes_data
+		parameter will be set to one. Otherwise it will be 0 and
+		the result of reading a discarded area is undefined.
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1fa7692..42d3bf5 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -120,7 +120,7 @@  void blk_set_default_limits(struct queue_limits *lim)
 	lim->discard_granularity = 0;
 	lim->discard_alignment = 0;
 	lim->discard_misaligned = 0;
-	lim->discard_zeroes_data = -1;
+	lim->discard_zeroes_data = 1;
 	lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
 	lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
 	lim->alignment_offset = 0;
@@ -166,6 +166,7 @@  void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
 
 	blk_set_default_limits(&q->limits);
 	blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
+	q->limits.discard_zeroes_data = 0;
 
 	/*
 	 * by default assume old behaviour and bounce for any highmem page
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e2f57e9e..30ea95f 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2138,7 +2138,7 @@  static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
 	 * with the unmap bit set.
 	 */
 	if (ata_id_has_trim(args->id)) {
-		put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
+		put_unaligned_be64(65535 * 512 / 8, &rbuf[36]);
 		put_unaligned_be32(1, &rbuf[28]);
 	}
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bd0806e..cc081dd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -490,7 +490,8 @@  static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 	unsigned int max_blocks = 0;
 
 	q->limits.discard_zeroes_data = sdkp->lbprz;
-	q->limits.discard_alignment = sdkp->unmap_alignment;
+	q->limits.discard_alignment = sdkp->unmap_alignment *
+		(logical_block_size >> 9);
 	q->limits.discard_granularity =
 		max(sdkp->physical_block_size,
 		    sdkp->unmap_granularity * logical_block_size);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2ad95fa..1416e3c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -257,7 +257,7 @@  struct queue_limits {
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
 	unsigned char		cluster;
-	signed char		discard_zeroes_data;
+	unsigned char		discard_zeroes_data;
 };
 
 struct request_queue
@@ -1066,13 +1066,16 @@  static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector
 {
 	unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1);
 
+	if (!lim->max_discard_sectors)
+		return 0;
+
 	return (lim->discard_granularity + lim->discard_alignment - alignment)
 		& (lim->discard_granularity - 1);
 }
 
 static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
 {
-	if (q->limits.discard_zeroes_data == 1)
+	if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1)
 		return 1;
 
 	return 0;