diff mbox

configurable discard parameters

Message ID CAGnHSEmDQ6xaJsw9i=v23BffWQ8RncrgNBsGWy3ziC36QK+9yQ@mail.gmail.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Tom Yan June 20, 2015, 5:12 p.m. UTC
Today I check if blkdiscard really does a full device trim/wipe for my
Intel 530 SSD (240gb) with hexdump. I end up found that it fail to do
so because it report garbage info on its block limits VPD.

[tom@localhost ~]$ sudo sg_vpd -p 0xb0 /dev/sda
Block limits VPD page (SBC):
  Write same non-zero (WSNZ): 0
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 0 blocks
  Optimal transfer length: 0 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 0
  Maximum unmap block descriptor count: 0
  Optimal unmap granularity: 1
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0x3fffc0 blocks
  Maximum atomic transfer length: 0
  Atomic alignment: 0
  Atomic transfer length granularity: 0

The fact is, in each iteration, for this drive, blkdiscard can only
trim a maximum of 65528 sectors, which is the largest multiple of 8
sectors, which is the minimum possible. (In fact 65535 sectors seems
to be some sort of limit of ATA TRIM commands. It's also the maximum
of WRITE SAME (10).)

With `blkdiscard`, I can still workaround this with the "--step"
option, but for discard mount options, I don't think I have any way to
deal with it. Although it's basically the reponsibility of Intel (Well
I tried to find a way to complain to them but even their website is
borked as always), but still I would like to know why the kernel
doesn't allow "discard_granularity" and "discard_max_bytes" to be
configurable for users. Also wanna share about this in case nobody
ever noticed.


Another case is, I have a SanDisk Extreme USB (32gb, w/o UASP), which
is sort of a SATA SSD with USB bridge. I can basically TRIM the drive
with hdparm (although there's a minor issue:
https://sourceforge.net/p/hdparm/bugs/63/), but I can't do it with
blkdiscard. At first I thought it was only because its VPD doesn't
provide limits of discard:

[tom@localhost ~]$ sudo sg_vpd -p 0xb0 /dev/sdc
Block limits VPD page (SBC):
  Write same non-zero (WSNZ): 0
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 8388607 blocks
  Optimal transfer length: 8388607 blocks
  Maximum prefetch length: 0 blocks

but then I try to use the attached patch to force some value for it,
blkdiscard still gives me "I/O error" instead of "NOT SUPPROTED". So
what's the requirement of the current discard way in the kernel? Does
it work differently on SATA drives than on USB drives because of the
controller (like involvement of libata)? Does it require capability of
one of the three scsi unmap ways? Or did I just miss some other dirty
hacking bits?

Comments

Martin K. Petersen June 21, 2015, 12:20 a.m. UTC | #1
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> Today I check if blkdiscard really does a full device trim/wipe for
Tom> my Intel 530 SSD (240gb) with hexdump. I end up found that it fail
Tom> to do so because it report garbage info on its block limits VPD.

It is a SATA-attached drive, it has no block limits VPD. What you are
seeing is information prepared by libata's SATL.

Tom> The fact is, in each iteration, for this drive, blkdiscard can only
Tom> trim a maximum of 65528 sectors, which is the largest multiple of 8
Tom> sectors, which is the minimum possible.

  Maximum write same length: 0x3fffc0 blocks

That's 2GB-e per request for a drive with 512-byte logical blocks.

Tom> but still I would like to know why the kernel doesn't allow
Tom> "discard_granularity" and "discard_max_bytes" to be configurable
Tom> for users.

Because if the vendor got these trivial values wrong there is little to
no chance that they implemented discard correctly in their firmware.
Tom Yan June 21, 2015, 7:03 a.m. UTC | #2
On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote:
> It is a SATA-attached drive, it has no block limits VPD. What you are
> seeing is information prepared by libata's SATL.
>
> Because if the vendor got these trivial values wrong there is little to
> no chance that they implemented discard correctly in their firmware.

I don't get it. So there's a chance that the VPDs is not purely from
the hardware? Then how can I differentiate them? But then you said
"the vendor got these trivial values wrong", so were you talking about
this drive or just for real SCSI drives?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
Tom Yan June 21, 2015, 8:05 a.m. UTC | #3
By the way do you think it could be a bug of libata's SATL anyway?
Like perhaps it should break a single unmap request to multiple ATA
commands? I am not totally sure about it but it looks like there's a
limit of addressed blocks in a single ATA DSM/TRIM command (4 bytes,
which is 65535).

On 21 June 2015 at 15:03, Tom Yan <tom.ty89@gmail.com> wrote:
> On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>> It is a SATA-attached drive, it has no block limits VPD. What you are
>> seeing is information prepared by libata's SATL.
>>
>> Because if the vendor got these trivial values wrong there is little to
>> no chance that they implemented discard correctly in their firmware.
>
> I don't get it. So there's a chance that the VPDs is not purely from
> the hardware? Then how can I differentiate them? But then you said
> "the vendor got these trivial values wrong", so were you talking about
> this drive or just for real SCSI drives?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
Tom Yan June 21, 2015, 12:36 p.m. UTC | #4
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=18f0f97850059303ed73b1f02084f55ca330a80c

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=e78db4dfb1355a895f7ea50133b702b55b8ed184

So I think I found the answer myself. But I don't see the rationale
here at all. So values (arbitrary?) of granularity and maximum has
been hardcoded in the kernel for ALL devices which make use of libata,
and the parameter cannot be changed by user in runtime? So does it
expect all devices to be of the same spec? Or is it just like a
draft/example which people can only consider it as "may work"?

On 21 June 2015 at 16:05, Tom Yan <tom.ty89@gmail.com> wrote:
> By the way do you think it could be a bug of libata's SATL anyway?
> Like perhaps it should break a single unmap request to multiple ATA
> commands? I am not totally sure about it but it looks like there's a
> limit of addressed blocks in a single ATA DSM/TRIM command (4 bytes,
> which is 65535).
>
> On 21 June 2015 at 15:03, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>>> It is a SATA-attached drive, it has no block limits VPD. What you are
>>> seeing is information prepared by libata's SATL.
>>>
>>> Because if the vendor got these trivial values wrong there is little to
>>> no chance that they implemented discard correctly in their firmware.
>>
>> I don't get it. So there's a chance that the VPDs is not purely from
>> the hardware? Then how can I differentiate them? But then you said
>> "the vendor got these trivial values wrong", so were you talking about
>> this drive or just for real SCSI drives?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
Tom Yan June 21, 2015, 8:30 p.m. UTC | #5
After rechecking the result of blkdiscard (without specifying --step)
with hexdump, I see that the single ioctl is actually broken into
multiple TRIM commands/ranges by 65535 sector. I wonder if it will be
further tuned by ANY KIND of granularity though. Though it will
probably only be a problem for device TRIM, and for that you can
probably always rely on blkdiscard with --step (and avoid device TRIM
with things like mkfs.btrfs) so I guess it won't be a big problem. And
it will be fine for filesystem TRIM anyway I guess.

So it seems that "Maximum write same length: 0x3fffc0 blocks" doesn't
really matter (In fact when I ran `sg_write_same -U --lba=0
--num=0x7fffffff`, the command also trim the first 65528 sectors of
the drive). However, I am curious what is the meaning of "Our current
TRIM payload is a single sector that can accommodate 64 * 65535 blocks
being unmapped. Report this value in the Block Limits Maximum Unmap
LBA count field." in the commit message. What does the value actually
mean/affect? Could it cause trouble to certain drives?

By the way I think I got the answer for my USB TRIM question.
Basically for USB->SATA drives the SATL is implemented in the bridges,
so it must be able to "SAT" one of the three SCSI unmap methods to ATA
TRIM, just like libata does, so that it can handle whatever requested
through the scsi disk driver, otherwise one can only TRIM it through
SCSI's ATA pass-through commands if the bridge supports them, e.g.
hdparm. So my dirty hacking was silly.

P.S.:

[tom@localhost ~]$ sudo sg_vpd -p ai /dev/sda
ATA information VPD page:
  SAT Vendor identification: linux
  SAT Product identification: libata
  SAT Product revision level: 3.00

So it actually tells. And I hope that the kernel wouldn't "falsify"
anything for devices which do provide some VPD(s). : \

Another P.S.: I made a mistake in one of my last mails. 65535 is two
bytes, not four bytes. : /

On 21 June 2015 at 20:36, Tom Yan <tom.ty89@gmail.com> wrote:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=18f0f97850059303ed73b1f02084f55ca330a80c
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/ata/libata-scsi.c?id=e78db4dfb1355a895f7ea50133b702b55b8ed184
>
> So I think I found the answer myself. But I don't see the rationale
> here at all. So values (arbitrary?) of granularity and maximum has
> been hardcoded in the kernel for ALL devices which make use of libata,
> and the parameter cannot be changed by user in runtime? So does it
> expect all devices to be of the same spec? Or is it just like a
> draft/example which people can only consider it as "may work"?
>
> On 21 June 2015 at 16:05, Tom Yan <tom.ty89@gmail.com> wrote:
>> By the way do you think it could be a bug of libata's SATL anyway?
>> Like perhaps it should break a single unmap request to multiple ATA
>> commands? I am not totally sure about it but it looks like there's a
>> limit of addressed blocks in a single ATA DSM/TRIM command (4 bytes,
>> which is 65535).
>>
>> On 21 June 2015 at 15:03, Tom Yan <tom.ty89@gmail.com> wrote:
>>> On 21 June 2015 at 08:20, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>>>> It is a SATA-attached drive, it has no block limits VPD. What you are
>>>> seeing is information prepared by libata's SATL.
>>>>
>>>> Because if the vendor got these trivial values wrong there is little to
>>>> no chance that they implemented discard correctly in their firmware.
>>>
>>> I don't get it. So there's a chance that the VPDs is not purely from
>>> the hardware? Then how can I differentiate them? But then you said
>>> "the vendor got these trivial values wrong", so were you talking about
>>> this drive or just for real SCSI drives?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
Martin K. Petersen June 22, 2015, 8:57 p.m. UTC | #6
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> However, I am curious what is the meaning of "Our current TRIM
Tom> payload is a single sector that can accommodate 64 * 65535 blocks
Tom> being unmapped. Report this value in the Block Limits Maximum Unmap
Tom> LBA count field." in the commit message. What does the value
Tom> actually mean/affect? Could it cause trouble to certain drives?

DSM Trim takes a payload of <block nr, block count> ranges. The block
count is constrained to 65535 by virtue of being 16 bits. We can fit 64
8-byte range descriptors in a single 512-byte sector payload. That's why
we set the limit for a single, contiguous discard request to
0x3fffc0. That results in 2GB minus change for a single command.

Tom> By the way I think I got the answer for my USB TRIM question.
Tom> Basically for USB->SATA drives the SATL is implemented in the
Tom> bridges, so it must be able to "SAT" one of the three SCSI unmap
Tom> methods to ATA TRIM, just like libata does,

Yep. I have set to see one that gets that right, though. There are even
many enterprise SAS/RAID controllers that don't support UNMAP-TRIM
translation.

Linux' libata was one of the first implementations of SCSI-ATA
translation for TRIM. Initially we bent the rules for UNMAP a bit. When
the standards caught up we switched to WRITE SAME.

Tom> So it actually tells. And I hope that the kernel wouldn't "falsify"
Tom> anything for devices which do provide some VPD(s). : \

SATA doesn't have VPDs.
Tom Yan June 23, 2015, 2:16 p.m. UTC | #7
Still, I wonder if the kernel should allow the user to configure the
real granularity and send ATA commands that are rounded off by it
(which works just like --step in blkdiscard). For example,

(64 * 65535) % 8 = 0

but

65535 % 8 = 7

So the block count limit for the ATA commands would be 65535 - (65535
% real_gran) per range.


Also, by experimenting on my SanDisk Extreme USB with hdparm TRIM and
hexdump, I can see that the maximum block/sector allowed to be
discarded per ATA command is 65536, inspite of the payload size. I
don't know whether the USB bridging or the way hdparm does TRIM
matters, but it seems that some devices can't really handle limit like
0x3fffc0 blocks.

So I still think that the kernel should allow users to configure
limits that can make libata send reliable ATA TRIM commands.

On 23 June 2015 at 04:57, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>
> Tom> So it actually tells. And I hope that the kernel wouldn't "falsify"
> Tom> anything for devices which do provide some VPD(s). : \
>
> SATA doesn't have VPDs.
>

I know now, that's why I think it's "alright" for libata to do so. I
just don't know whether the kernel would tamper with them for devices
like USB or real SCSI drives.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen June 23, 2015, 3:36 p.m. UTC | #8
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> I don't know whether the USB bridging or the way hdparm does TRIM
Tom> matters, but it seems that some devices can't really handle limit
Tom> like 0x3fffc0 blocks.

The 0x3fffc0 limit is for SATA devices connected through Linux' libata
SCSI ATA translation. This number corresponds to the biggest range we
can discard while maintaining a 1:1 mapping between the SCSI command and
the ATA ditto.

If you connect the SATA drive to a USB/UAS bridge you are entirely at
the bridge's whim when it comes to translation of WRITE SAME(10/16) w/
UNMAP or the UNMAP command to DSM TRIM. libata is not involved and
neither is the 0x3fffc0 limit (although chances are that the drive has
the same limit internally). Even when using hdparm the bridge device
still needs to correctly translate the ATA PASSTHROUGH commands.

Tom> So I still think that the kernel should allow users to configure
Tom> limits that can make libata send reliable ATA TRIM commands.

You haven't given us any reason to. We are not aware of any ATA drives
that put constraints on the range block count.

Again, if you want to help please focus on your bridge devices and what,
if anything, can be done to uniquely identify them and override any
incorrect values they might report to the SCSI stack. Up to and
including us disabling discard entirely on these devices if their
translation isn't verifiable and bullet proof.
Tom Yan June 23, 2015, 4:41 p.m. UTC | #9
On 23 June 2015 at 23:36, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>
> You haven't given us any reason to. We are not aware of any ATA drives
> that put constraints on the range block count.
>

What I have been doing is trying to show you example of those
constraints. When I talked about the block count limit of the SanDisk
Extreme USB, I am not asking you to fix anything for the drive I HAVE
(because I can only use hdparm to TRIM it anyway), but to show you
that some devices MIGHT have such limit. I just can't be 100% sure of
it, even though from what I see in the different results on payload
size and range sector count, I don't think the bridge is intervening
at all (but just the limit of the SATA controller behind).

But for my Intel 530 SSD, the granularity problem is obvious and solid
enough because it's connected directly with SATA. Currently what the
kernel does is assume all devices support 1 sector granularity. And
the problem doesn't even lie on the current "hardcoded" granularity in
libata, because blkdev_issue_discard() only does "mod check" against
the granularity on the maximum sector counts, so even if it's allowed
to be configured, it wouldn't really help. And I have yet to see
anything which actually does "mod check" against ANY granularity on
the sector count per range. This is what the example in my last mail
was about.

And the only info I have ever saw about TRIM block limit of a SATA
drive is in `hdparm -I`. For my Intel SSD, which actually has a lower
limit of 8 sectors, it shows:
Data Set Management TRIM supported (limit 1 block)
while for the SanDisk Extreme USB, which actually has a lower limit of
256 sectors, it shows:
Data Set Management TRIM supported (limit 8 blocks)
Inspite of the accuracy, I don't see the kernel actually read this limit.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen June 23, 2015, 5:03 p.m. UTC | #10
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> Currently what the kernel does is assume all devices support 1
Tom> sector granularity.

The ATA Command Set does not allow for any other granularity than 1
sector.

Bigger granularity reporting is supported in SCSI SBC to allow for
thinly provisioned disk arrays with a bigger allocation unit. It is
common for disk arrays to provision LUN space in units of 1MB or more.

Tom> For my Intel SSD, which actually has a lower limit of 8 sectors, it
Tom> shows: Data Set Management TRIM supported (limit 1 block) while for
Tom> the SanDisk Extreme USB, which actually has a lower limit of 256
Tom> sectors, it shows: Data Set Management TRIM supported (limit 8
Tom> blocks)

It don't know what these "lower limits" you are talking about are.

What hdparm tells you is that DSM TRIM on the Intel drive supports 1
block (512 bytes) of range payload data. Whereas the SanDisk drive
supports a full 4KB page of range payload data (8 * 512 bytes).

We did try to enable payloads bigger than 512 bytes back in the day but
most drives that reported supporting the bigger payload actually didn't
and all hell broke loose. So we reverted to a single sector payload for
libata.

I still have the payload patch in my SSD test branch and regularly test
with it. But there are still drives that fail in mysterious ways with it
in place and so far I haven't felt compelled to maintain yet another
blacklist.
Tom Yan June 23, 2015, 5:24 p.m. UTC | #11
On 24 June 2015 at 01:03, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>
> It don't know what these "lower limits" you are talking about are.
>

[tom@localhost ~]$ sudo shred -v -n 1 /dev/sda3

[tom@localhost ~]$ sudo blkdiscard -l 512 /dev/sda3
[tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3 | head
0000000 58c7 f3b0 0f97 2a9a 2da8 7c44 11f5 7852
0000010 4f1d d19c 682b a3d8 6be9 61df 4eb3 a4ee
0000020 f28b f2f6 ef36 c244 f1b0 eadd 5ad0 c5ec
0000030 54c4 abb4 f9c3 7ca0 d807 63ad 81cd 45e6
0000040 2b30 709c 550a 5e52 6928 4468 78c9 f671
0000050 eef1 dfa7 2a6d 3150 64f8 f9a2 240d bbf4
0000060 2cff 02e0 c893 16a7 67c2 8cd9 af76 71b6
0000070 4740 83cc 5b15 3db0 64b2 1a6d 3f40 5a98
0000080 09d0 035c b5ae 2cd2 1850 ea70 7a09 0491
0000090 6b27 9d81 9d74 a087 afd9 e75a 85ea adf6

[tom@localhost ~]$ sudo blkdiscard -l 3584 /dev/sda3
[tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3 | head
0000000 58c7 f3b0 0f97 2a9a 2da8 7c44 11f5 7852
0000010 4f1d d19c 682b a3d8 6be9 61df 4eb3 a4ee
0000020 f28b f2f6 ef36 c244 f1b0 eadd 5ad0 c5ec
0000030 54c4 abb4 f9c3 7ca0 d807 63ad 81cd 45e6
0000040 2b30 709c 550a 5e52 6928 4468 78c9 f671
0000050 eef1 dfa7 2a6d 3150 64f8 f9a2 240d bbf4
0000060 2cff 02e0 c893 16a7 67c2 8cd9 af76 71b6
0000070 4740 83cc 5b15 3db0 64b2 1a6d 3f40 5a98
0000080 09d0 035c b5ae 2cd2 1850 ea70 7a09 0491
0000090 6b27 9d81 9d74 a087 afd9 e75a 85ea adf6

[tom@localhost ~]$ sudo blkdiscard -l 4096 /dev/sda3
[tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3 | head
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000

So when libata issue ATA commands with ranges of 65535 sectors, only
65535-(65535%8) = 65528 sectors are discarded, yet it wouldn't know
about that so the next range will start from LBA 65536. I can
workaround this by specifying --step in blkdiscard, but I think the
kernel should have a param configurable for general.

>
> What hdparm tells you is that DSM TRIM on the Intel drive supports 1
> block (512 bytes) of range payload data. Whereas the SanDisk drive
> supports a full 4KB page of range payload data (8 * 512 bytes).
>

Oh so they are not reporting garbage. That's good to know. I guess
that's why hdparm actually splits the range list I provided by
512-range per ATA command (which works) for the SanDisk drive.

>
> We did try to enable payloads bigger than 512 bytes back in the day but
> most drives that reported supporting the bigger payload actually didn't
> and all hell broke loose. So we reverted to a single sector payload for
> libata.
>
> I still have the payload patch in my SSD test branch and regularly test
> with it. But there are still drives that fail in mysterious ways with it
> in place and so far I haven't felt compelled to maintain yet another
> blacklist.
>

For now I have no problem with the single sector payload thing.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen June 23, 2015, 6:26 p.m. UTC | #12
>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> So when libata issue ATA commands with ranges of 65535 sectors,
Tom> only 65535-(65535%8) = 65528 sectors are discarded,

That's unfortunate but TRIM is advisory so the drive is free to ignore
all or parts of the request.

What happens if you discard sectors 0-6 and then sector 7?

Tom> I can workaround this by specifying --step in blkdiscard, but I
Tom> think the kernel should have a param configurable for general.

This is on the Intel 530? What does the drive report in
/sys/block/sdN/queue/discard_zeroes_data?
Tom Yan June 23, 2015, 9:25 p.m. UTC | #13
First of all let me add another "statistic" about the issue:

[tom@localhost ~]$ sudo shred -n 1 /dev/sda3
[tom@localhost ~]$ sudo blkdiscard /dev/sda3
[tom@localhost ~]$ sudo hexdump /dev/sda3 | wc -l
310635
[tom@localhost ~]$ sudo hexdump /dev/sda3 | pcregrep -M '0000 0000
0000 0000 0000 0000 0000 0000\n\*' | wc -l
2410

total untrimmed sectors: (310635 - 1 - 2410) / 32 = 9632
total ranges: (43 GiB * (1024 ^ 3) / 512) / 65535 ~= 1376
average untrimmed sectors per range: (9632 / 1376) = 7 = (65535 % 8)

Also, FWIW:

[tom@localhost ~]$ sudo shred -n 1 /dev/sda3
[tom@localhost ~]$ let step=(65535-65535%8)*512
[tom@localhost ~]$ echo $step
33550336
[tom@localhost ~]$ sudo blkdiscard -p "$step" /dev/sda3
[tom@localhost ~]$ sudo hexdump /dev/sda3
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
ac0000000

On 24 June 2015 at 02:26, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>
> What happens if you discard sectors 0-6 and then sector 7?
>

[tom@localhost ~]$ sudo shred -n 1 /dev/sda3
[tom@localhost ~]$ sudo blkdiscard -l 3584 /dev/sda3
[tom@localhost ~]$ sudo blkdiscard -o 3584 -l 512 /dev/sda3
[tom@localhost ~]$ sudo hexdump -n 4096 /dev/sda3
0000000 f06d 8365 5e1b 616c 7362 4d61 2182 02fb
...
0000ff0 54ef 9579 51bc 9042 115a 375e c28f 4dcc
0001000

>
> This is on the Intel 530? What does the drive report in
> /sys/block/sdN/queue/discard_zeroes_data?
>

Yes. It reports 0. In `hdparm -I`:
       *    Data Set Management TRIM supported (limit 1 block)
       *    Deterministic read data after TRIM

You may also want to check out and compare the two attached test case files.
diff mbox

Patch

diff --git a/drivers/scsi/sd.c~ b/drivers/scsi/sd.c
index a661d33..a9cdfb6 100644
--- a/drivers/scsi/sd.c~
+++ b/drivers/scsi/sd.c
@@ -99,6 +99,7 @@  MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
 #endif
 
 static void sd_config_discard(struct scsi_disk *, unsigned int);
+static void sd_config_discard_novpd(struct scsi_disk *);
 static void sd_config_write_same(struct scsi_disk *);
 static int  sd_revalidate_disk(struct gendisk *);
 static void sd_unlock_native_capacity(struct gendisk *disk);
@@ -700,6 +701,22 @@  static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
 }
 
+static void sd_config_discard_novpd(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned int logical_block_size = sdkp->device->sector_size;
+	unsigned int max_blocks = (u32)SD_MAX_WS16_BLOCKS;
+
+	q->limits.discard_zeroes_data = 0;
+	q->limits.discard_alignment = 0;
+	q->limits.discard_granularity = sdkp->physical_block_size;
+
+	sdkp->provisioning_mode = SD_LBP_WS16;
+
+	q->limits.max_discard_sectors = max_blocks * (logical_block_size >> 9);
+	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+}
+
 /**
  * sd_setup_discard_cmnd - unmap blocks on thinly provisioned device
  * @sdp: scsi device to operate one
@@ -2776,6 +2793,9 @@  static int sd_revalidate_disk(struct gendisk *disk)
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
 		}
+		else {
+			sd_config_discard_novpd(sdkp);
+		}
 
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);