Patchwork mke2fs: restore verbose message for BLKDISCARD

login
register
mail settings
Submitter Stephen Warren
Date March 5, 2013, 8:25 p.m.
Message ID <1362515118-30344-1-git-send-email-swarren@wwwdotorg.org>
Download mbox | patch
Permalink /patch/225172/
State Rejected
Headers show

Comments

Stephen Warren - March 5, 2013, 8:25 p.m.
From: Stephen Warren <swarren@nvidia.com>

mke2fs on a large slow eMMC device may appear to hang while executing
ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond
extremely slowly. -v doesn't give any hints what's happening. Only strace
is a clue. Make -v print some clues to make it easier to track down the
apparent hang.

This change re-uses the original messages that were implemented as part
of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the
translations of that message. Note that this patch prints the first
message before executing the IOCTL, so the user is told what's going on
before the long wait.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
Note: I've left mke2fs running for a while now (tens of minutes on a 16GiB
eMMC), and this IOCTL still hasn't completed. I'm not sure how long it's
meant to take. Perhaps the long execution time is actually a kernel bug in
the MMC core or our SDHCI driver.
---
 misc/mke2fs.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)
Lukas Czerner - March 6, 2013, 7:13 a.m.
On Tue, 5 Mar 2013, Stephen Warren wrote:

> Date: Tue,  5 Mar 2013 13:25:18 -0700
> From: Stephen Warren <swarren@wwwdotorg.org>
> To: Theodore Ts'o <tytso@mit.edu>
> Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org,
>     Stephen Warren <swarren@nvidia.com>
> Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD
> 
> From: Stephen Warren <swarren@nvidia.com>
> 
> mke2fs on a large slow eMMC device may appear to hang while executing
> ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond
> extremely slowly. -v doesn't give any hints what's happening. Only strace
> is a clue. Make -v print some clues to make it easier to track down the
> apparent hang.
> 
> This change re-uses the original messages that were implemented as part
> of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the
> translations of that message. Note that this patch prints the first
> message before executing the IOCTL, so the user is told what's going on
> before the long wait.
> 
> Signed-off-by: Stephen Warren <swarren@nvidia.com>

Note that you've added the message only around the first 4k discard
which is only used to test whether the device actually support the
discard... which is not very useful and it can not even know whether
the whole device discard will succeed/fail so it is misleading.

Moreover there actually is a message saying that we're "Discarding
device blocks" and it even shows the progress. The step in which
we're doing the discard (and update the progress) is given by
DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well.

So what's happening in your case is that your message will not be
written to the output stream (no newline) for some reason even though
that we call fflush() after fputs(). I am not sure how to fix it
though since I always find this stream buffering and fflush quite
confusing and it never worked properly for me...

-Lukas

> ---
> Note: I've left mke2fs running for a while now (tens of minutes on a 16GiB
> eMMC), and this IOCTL still hasn't completed. I'm not sure how long it's
> meant to take. Perhaps the long execution time is actually a kernel bug in
> the MMC core or our SDHCI driver.

As mentioned we're doing 2GB discard steps so the problem really
should not be on mkfs side. It is very much possible that the
discard implementation on the eMMC is terribly slow (which does not
surprise me). What is says about itself ?

grep . /sys/block/sda/queue/discard_*

> ---
>  misc/mke2fs.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index bbf477a..defb932 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -2272,7 +2272,20 @@ static int mke2fs_discard_device(ext2_filsys fs)
>  	 * we do not print numeric progress resulting in failure
>  	 * afterwards.
>  	 */
> +	if (verbose) {
> +		printf(_("Calling BLKDISCARD from %llu to %llu "),
> +			(unsigned long long)0,
> +			(unsigned long long)fs->blocksize);
> +		fflush(stdout);
> +	}
>  	retval = io_channel_discard(fs->io, 0, fs->blocksize);
> +	if (verbose) {
> +		if (retval)
> +			printf(_("failed.\n"));
> +		else
> +			printf(_("succeeded.\n"));
> +	}
> +
>  	if (retval)
>  		return retval;
>  	cur = fs->blocksize;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Warren - March 7, 2013, 7:47 p.m.
On 03/06/2013 12:13 AM, Lukáš Czerner wrote:
> On Tue, 5 Mar 2013, Stephen Warren wrote:
> 
>> Date: Tue,  5 Mar 2013 13:25:18 -0700
>> From: Stephen Warren <swarren@wwwdotorg.org>
>> To: Theodore Ts'o <tytso@mit.edu>
>> Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org,
>>     Stephen Warren <swarren@nvidia.com>
>> Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD
>>
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> mke2fs on a large slow eMMC device may appear to hang while executing
>> ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond
>> extremely slowly. -v doesn't give any hints what's happening. Only strace
>> is a clue. Make -v print some clues to make it easier to track down the
>> apparent hang.
>>
>> This change re-uses the original messages that were implemented as part
>> of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the
>> translations of that message. Note that this patch prints the first
>> message before executing the IOCTL, so the user is told what's going on
>> before the long wait.
>>
>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> 
> Note that you've added the message only around the first 4k discard
> which is only used to test whether the device actually support the
> discard... which is not very useful and it can not even know whether
> the whole device discard will succeed/fail so it is misleading.

Oops. I certainly should looked at the code better.

> Moreover there actually is a message saying that we're "Discarding
> device blocks" and it even shows the progress. The step in which
> we're doing the discard (and update the progress) is given by
> DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well.

That said, it's that very first call to io_channel_discard() that is
hanging then; I don't see the existing message at all. With my change, I
do see the new message that I added. (So, no stdio flushing issue here).

I will go investigate why this API is hanging. Sorry for the noise.

...
> As mentioned we're doing 2GB discard steps so the problem really
> should not be on mkfs side. It is very much possible that the
> discard implementation on the eMMC is terribly slow (which does not
> surprise me). What is says about itself ?
> 
> grep . /sys/block/sda/queue/discard_*

/sys/block/mmcblk1/queue/discard_granularity
0
/sys/block/mmcblk1/queue/discard_max_bytes
512
/sys/block/mmcblk1/queue/discard_zeroes_data
1

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner - March 8, 2013, 7:23 a.m.
On Thu, 7 Mar 2013, Stephen Warren wrote:

> Date: Thu, 07 Mar 2013 12:47:09 -0700
> From: Stephen Warren <swarren@wwwdotorg.org>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Theodore Ts'o <tytso@mit.edu>, Chris Ball <cjb@laptop.org>,
>     linux-ext4@vger.kernel.org, Stephen Warren <swarren@nvidia.com>
> Subject: Re: [PATCH] mke2fs: restore verbose message for BLKDISCARD
> 
> On 03/06/2013 12:13 AM, Lukáš Czerner wrote:
> > On Tue, 5 Mar 2013, Stephen Warren wrote:
> > 
> >> Date: Tue,  5 Mar 2013 13:25:18 -0700
> >> From: Stephen Warren <swarren@wwwdotorg.org>
> >> To: Theodore Ts'o <tytso@mit.edu>
> >> Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org,
> >>     Stephen Warren <swarren@nvidia.com>
> >> Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD
> >>
> >> From: Stephen Warren <swarren@nvidia.com>
> >>
> >> mke2fs on a large slow eMMC device may appear to hang while executing
> >> ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond
> >> extremely slowly. -v doesn't give any hints what's happening. Only strace
> >> is a clue. Make -v print some clues to make it easier to track down the
> >> apparent hang.
> >>
> >> This change re-uses the original messages that were implemented as part
> >> of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the
> >> translations of that message. Note that this patch prints the first
> >> message before executing the IOCTL, so the user is told what's going on
> >> before the long wait.
> >>
> >> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> > 
> > Note that you've added the message only around the first 4k discard
> > which is only used to test whether the device actually support the
> > discard... which is not very useful and it can not even know whether
> > the whole device discard will succeed/fail so it is misleading.
> 
> Oops. I certainly should looked at the code better.
> 
> > Moreover there actually is a message saying that we're "Discarding
> > device blocks" and it even shows the progress. The step in which
> > we're doing the discard (and update the progress) is given by
> > DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well.
> 
> That said, it's that very first call to io_channel_discard() that is
> hanging then; I don't see the existing message at all. With my change, I
> do see the new message that I added. (So, no stdio flushing issue here).

It's just one fs block discard, are you sure about that ?

> 
> I will go investigate why this API is hanging. Sorry for the noise.
> 
> ...
> > As mentioned we're doing 2GB discard steps so the problem really
> > should not be on mkfs side. It is very much possible that the
> > discard implementation on the eMMC is terribly slow (which does not
> > surprise me). What is says about itself ?
> > 
> > grep . /sys/block/sda/queue/discard_*
> 
> /sys/block/mmcblk1/queue/discard_granularity
> 0
> /sys/block/mmcblk1/queue/discard_max_bytes
> 512

Here is your problem. Doing discard of 16GB (or even one step of
2GB) on the slow device such the eMMC certainly is just 512B at a
time is expected to be *really* slow!

If you want to check out how fast/slow it actually is you can issue
discard directly on the device using blkdiscard toos which should be
part of the very recent util linux (not sure if it has been released
yet, but you can try git repository).

-Lukas

> /sys/block/mmcblk1/queue/discard_zeroes_data
> 1
> 
>
Stephen Warren - March 8, 2013, 5:18 p.m.
On 03/08/2013 12:23 AM, Lukáš Czerner wrote:
> On Thu, 7 Mar 2013, Stephen Warren wrote:
> 
>> Date: Thu, 07 Mar 2013 12:47:09 -0700
>> From: Stephen Warren <swarren@wwwdotorg.org>
>> To: Lukáš Czerner <lczerner@redhat.com>
>> Cc: Theodore Ts'o <tytso@mit.edu>, Chris Ball <cjb@laptop.org>,
>>     linux-ext4@vger.kernel.org, Stephen Warren <swarren@nvidia.com>
>> Subject: Re: [PATCH] mke2fs: restore verbose message for BLKDISCARD
>>
>> On 03/06/2013 12:13 AM, Lukáš Czerner wrote:
>>> On Tue, 5 Mar 2013, Stephen Warren wrote:
>>>
>>>> Date: Tue,  5 Mar 2013 13:25:18 -0700
>>>> From: Stephen Warren <swarren@wwwdotorg.org>
>>>> To: Theodore Ts'o <tytso@mit.edu>
>>>> Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org,
>>>>     Stephen Warren <swarren@nvidia.com>
>>>> Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD
>>>>
>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>
>>>> mke2fs on a large slow eMMC device may appear to hang while executing
>>>> ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond
>>>> extremely slowly. -v doesn't give any hints what's happening. Only strace
>>>> is a clue. Make -v print some clues to make it easier to track down the
>>>> apparent hang.
>>>>
>>>> This change re-uses the original messages that were implemented as part
>>>> of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the
>>>> translations of that message. Note that this patch prints the first
>>>> message before executing the IOCTL, so the user is told what's going on
>>>> before the long wait.
>>>>
>>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>>
>>> Note that you've added the message only around the first 4k discard
>>> which is only used to test whether the device actually support the
>>> discard... which is not very useful and it can not even know whether
>>> the whole device discard will succeed/fail so it is misleading.
>>
>> Oops. I certainly should looked at the code better.
>>
>>> Moreover there actually is a message saying that we're "Discarding
>>> device blocks" and it even shows the progress. The step in which
>>> we're doing the discard (and update the progress) is given by
>>> DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well.
>>
>> That said, it's that very first call to io_channel_discard() that is
>> hanging then; I don't see the existing message at all. With my change, I
>> do see the new message that I added. (So, no stdio flushing issue here).
> 
> It's just one fs block discard, are you sure about that ?

Yes:

root@localhost:~# strace -e ioctl ./e2fsprogs/misc/mke2fs -v /dev/mmcblk1p1
mke2fs 1.43-WIP (21-Jan-2013)
ioctl(3, BLKGETSIZE64, 0xbe940110)      = 0
fs_types for mke2fs.conf resolution: 'ext2'
ioctl(3, BLKSSZGET, 0xbe940360)         = 0
ioctl(3, BLKPBSZGET, 0xbe940364)        = 0
ioctl(3, BLKDISCARDZEROES, 0xbe940324)  = 0
ioctl(3, BLKROGET, 0xbe940324)          = 0
Calling BLKDISCARD from 0 to 4096 ioctl(3, BLKDISCARD

>> I will go investigate why this API is hanging. Sorry for the noise.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 8, 2013, 7 p.m.
I hate to suggest this, but there's so many crappy devices out there
that I'm wondering if we need to figure out some way of maintaining a
black list of devices that don't handle discard properly.  For
example, the Sandisk U100 advertises a max discard granularity of 512
bytes, but I've been advised that if you don't use a discard
granularity of 256k, aligned on 256k, you'll be very, very, sorry.

It sounds like Stephen's device is probably an example of Yet Another
Busted Trim implementation.  The problem is that manufacturers will be
releasing more broken products faster than we can update a blacklist
in the kernel.  So any blacklist would have to be maintained online on
the web, and dynamically updated by distro installers.   :-(

    	     		 	    	   		 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 8, 2013, 7:06 p.m.
Here's an idea.  What if we set a timer when send the first probing
discard, and if it takes too long, we print an error message and then
try to recover.

More generally, if we can't trust a device's advertised discard
parameters, maybe it makes sense to create a userspace program that
benchmarks the device to determine various critical flash parameters,
including "does discard really work?"  "Is it hopelessly slow"', and
"what is the erase block size"?  This program can then send the
appropriate parameters to a filesystem's mkfs program.

   	      	      	  	   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Ball - March 8, 2013, 7:08 p.m.
Hi,

On Fri, Mar 08 2013, Theodore Ts'o wrote:
> I hate to suggest this, but there's so many crappy devices out there
> that I'm wondering if we need to figure out some way of maintaining a
> black list of devices that don't handle discard properly.

Yes, we already do that in the MMC layer.  See e.g.:

http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc.git/commit/?id=3550ccdb9d8d350e526b809bf3dd92b550a74fe1
("mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.")

Thanks,

- Chris.
Stephen Warren - March 8, 2013, 8:03 p.m.
On 03/08/2013 12:08 PM, Chris Ball wrote:
> Hi,
> 
> On Fri, Mar 08 2013, Theodore Ts'o wrote:
>> I hate to suggest this, but there's so many crappy devices out there
>> that I'm wondering if we need to figure out some way of maintaining a
>> black list of devices that don't handle discard properly.
> 
> Yes, we already do that in the MMC layer.  See e.g.:
> 
> http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc.git/commit/?id=3550ccdb9d8d350e526b809bf3dd92b550a74fe1
> ("mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.")

Unrecoverable sounds scary. Does that just mean data corruption, or a
permanently broken device? mk2e2fs -K followed by e2fsck and mounting
luckily seems to still work OK for me:-)

FWIW, my CID is 45010053454d31364790711c665f3e00, i.e. manufacturer ID
0x45, device name SEM16G.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Ball - March 8, 2013, 8:12 p.m.
Hi,

On Fri, Mar 08 2013, Stephen Warren wrote:
>> On Fri, Mar 08 2013, Theodore Ts'o wrote:
>>> I hate to suggest this, but there's so many crappy devices out there
>>> that I'm wondering if we need to figure out some way of maintaining a
>>> black list of devices that don't handle discard properly.
>> 
>> Yes, we already do that in the MMC layer.  See e.g.:
>> 
>> http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc.git/commit/?id=3550ccdb9d8d350e526b809bf3dd92b550a74fe1
>> ("mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.")
>
> Unrecoverable sounds scary. Does that just mean data corruption, or a
> permanently broken device? mk2e2fs -K followed by e2fsck and mounting
> luckily seems to still work OK for me:-)

Incredibly, it appears to cause a permanently broken device.  The URLs
in the commit message have some more details.

- Chris.
Eric Sandeen - March 11, 2013, 2:08 p.m.
On 3/8/13 1:00 PM, Theodore Ts'o wrote:
> I hate to suggest this, but there's so many crappy devices out there
> that I'm wondering if we need to figure out some way of maintaining a
> black list of devices that don't handle discard properly.  For
> example, the Sandisk U100 advertises a max discard granularity of 512
> bytes, but I've been advised that if you don't use a discard
> granularity of 256k, aligned on 256k, you'll be very, very, sorry.

> It sounds like Stephen's device is probably an example of Yet Another
> Busted Trim implementation.  The problem is that manufacturers will be
> releasing more broken products faster than we can update a blacklist
> in the kernel.  So any blacklist would have to be maintained online on
> the web, and dynamically updated by distro installers.   :-(

Yeah I think it may be time for a blacklist.

TBH I think we should revisit discard-at-mkfs-time by default as well.

It seemed like a decent idea at the time, but now we have handy
fstrim as well as online/dynamic/realtime trim, and trim at mkfs
makes mke2fs completely un-doable in the event of fat fingers.

-Eric


> 
>     	     		 	    	   		 - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner - March 11, 2013, 2:18 p.m.
On Mon, 11 Mar 2013, Eric Sandeen wrote:

> Date: Mon, 11 Mar 2013 09:08:20 -0500
> From: Eric Sandeen <sandeen@redhat.com>
> To: Theodore Ts'o <tytso@mit.edu>
> Cc: Stephen Warren <swarren@wwwdotorg.org>,
>     Lukáš Czerner <lczerner@redhat.com>, Chris Ball <cjb@laptop.org>,
>     linux-ext4@vger.kernel.org, Stephen Warren <swarren@nvidia.com>
> Subject: Re: [PATCH] mke2fs: restore verbose message for BLKDISCARD
> 
> On 3/8/13 1:00 PM, Theodore Ts'o wrote:
> > I hate to suggest this, but there's so many crappy devices out there
> > that I'm wondering if we need to figure out some way of maintaining a
> > black list of devices that don't handle discard properly.  For
> > example, the Sandisk U100 advertises a max discard granularity of 512
> > bytes, but I've been advised that if you don't use a discard
> > granularity of 256k, aligned on 256k, you'll be very, very, sorry.
> 
> > It sounds like Stephen's device is probably an example of Yet Another
> > Busted Trim implementation.  The problem is that manufacturers will be
> > releasing more broken products faster than we can update a blacklist
> > in the kernel.  So any blacklist would have to be maintained online on
> > the web, and dynamically updated by distro installers.   :-(
> 
> Yeah I think it may be time for a blacklist.
> 
> TBH I think we should revisit discard-at-mkfs-time by default as well.
> 
> It seemed like a decent idea at the time, but now we have handy
> fstrim as well as online/dynamic/realtime trim, and trim at mkfs
> makes mke2fs completely un-doable in the event of fat fingers.

However that will make things even worse when someone uses it on
such device, than simply realising the problem on mkfs time. Sure,
there are buggy or crappy devices out there, but that's not we want
to optimize for right ?

Regarding the 'fat fingers' point I am starting to thing that
refusing to create the file system if we found any old signature
(like xfs does) would really be a good thing to have. In order not
to break scripts we can fall back to the old behaviour if we find
out that there is not any interactive input attached (we're running
from the script).

-Lukas

> 
> -Eric
> 
> 
> > 
> >     	     		 	    	   		 - Ted
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
>

Patch

diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index bbf477a..defb932 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -2272,7 +2272,20 @@  static int mke2fs_discard_device(ext2_filsys fs)
 	 * we do not print numeric progress resulting in failure
 	 * afterwards.
 	 */
+	if (verbose) {
+		printf(_("Calling BLKDISCARD from %llu to %llu "),
+			(unsigned long long)0,
+			(unsigned long long)fs->blocksize);
+		fflush(stdout);
+	}
 	retval = io_channel_discard(fs->io, 0, fs->blocksize);
+	if (verbose) {
+		if (retval)
+			printf(_("failed.\n"));
+		else
+			printf(_("succeeded.\n"));
+	}
+
 	if (retval)
 		return retval;
 	cur = fs->blocksize;