Message ID | 1362515118-30344-1-git-send-email-swarren@wwwdotorg.org |
---|---|
State | Rejected, archived |
Headers | show |
On Tue, 5 Mar 2013, Stephen Warren wrote: > Date: Tue, 5 Mar 2013 13:25:18 -0700 > From: Stephen Warren <swarren@wwwdotorg.org> > To: Theodore Ts'o <tytso@mit.edu> > Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org, > Stephen Warren <swarren@nvidia.com> > Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD > > From: Stephen Warren <swarren@nvidia.com> > > mke2fs on a large slow eMMC device may appear to hang while executing > ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond > extremely slowly. -v doesn't give any hints what's happening. Only strace > is a clue. Make -v print some clues to make it easier to track down the > apparent hang. > > This change re-uses the original messages that were implemented as part > of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the > translations of that message. Note that this patch prints the first > message before executing the IOCTL, so the user is told what's going on > before the long wait. > > Signed-off-by: Stephen Warren <swarren@nvidia.com> Note that you've added the message only around the first 4k discard which is only used to test whether the device actually support the discard... which is not very useful and it can not even know whether the whole device discard will succeed/fail so it is misleading. Moreover there actually is a message saying that we're "Discarding device blocks" and it even shows the progress. The step in which we're doing the discard (and update the progress) is given by DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well. So what's happening in your case is that your message will not be written to the output stream (no newline) for some reason even though that we call fflush() after fputs(). I am not sure how to fix it though since I always find this stream buffering and fflush quite confusing and it never worked properly for me... -Lukas > --- > Note: I've left mke2fs running for a while now (tens of minutes on a 16GiB > eMMC), and this IOCTL still hasn't completed. I'm not sure how long it's > meant to take. Perhaps the long execution time is actually a kernel bug in > the MMC core or our SDHCI driver. As mentioned we're doing 2GB discard steps so the problem really should not be on mkfs side. It is very much possible that the discard implementation on the eMMC is terribly slow (which does not surprise me). What is says about itself ? grep . /sys/block/sda/queue/discard_* > --- > misc/mke2fs.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/misc/mke2fs.c b/misc/mke2fs.c > index bbf477a..defb932 100644 > --- a/misc/mke2fs.c > +++ b/misc/mke2fs.c > @@ -2272,7 +2272,20 @@ static int mke2fs_discard_device(ext2_filsys fs) > * we do not print numeric progress resulting in failure > * afterwards. > */ > + if (verbose) { > + printf(_("Calling BLKDISCARD from %llu to %llu "), > + (unsigned long long)0, > + (unsigned long long)fs->blocksize); > + fflush(stdout); > + } > retval = io_channel_discard(fs->io, 0, fs->blocksize); > + if (verbose) { > + if (retval) > + printf(_("failed.\n")); > + else > + printf(_("succeeded.\n")); > + } > + > if (retval) > return retval; > cur = fs->blocksize; > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/06/2013 12:13 AM, Lukáš Czerner wrote: > On Tue, 5 Mar 2013, Stephen Warren wrote: > >> Date: Tue, 5 Mar 2013 13:25:18 -0700 >> From: Stephen Warren <swarren@wwwdotorg.org> >> To: Theodore Ts'o <tytso@mit.edu> >> Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org, >> Stephen Warren <swarren@nvidia.com> >> Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD >> >> From: Stephen Warren <swarren@nvidia.com> >> >> mke2fs on a large slow eMMC device may appear to hang while executing >> ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond >> extremely slowly. -v doesn't give any hints what's happening. Only strace >> is a clue. Make -v print some clues to make it easier to track down the >> apparent hang. >> >> This change re-uses the original messages that were implemented as part >> of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the >> translations of that message. Note that this patch prints the first >> message before executing the IOCTL, so the user is told what's going on >> before the long wait. >> >> Signed-off-by: Stephen Warren <swarren@nvidia.com> > > Note that you've added the message only around the first 4k discard > which is only used to test whether the device actually support the > discard... which is not very useful and it can not even know whether > the whole device discard will succeed/fail so it is misleading. Oops. I certainly should looked at the code better. > Moreover there actually is a message saying that we're "Discarding > device blocks" and it even shows the progress. The step in which > we're doing the discard (and update the progress) is given by > DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well. That said, it's that very first call to io_channel_discard() that is hanging then; I don't see the existing message at all. With my change, I do see the new message that I added. (So, no stdio flushing issue here). I will go investigate why this API is hanging. Sorry for the noise. ... > As mentioned we're doing 2GB discard steps so the problem really > should not be on mkfs side. It is very much possible that the > discard implementation on the eMMC is terribly slow (which does not > surprise me). What is says about itself ? > > grep . /sys/block/sda/queue/discard_* /sys/block/mmcblk1/queue/discard_granularity 0 /sys/block/mmcblk1/queue/discard_max_bytes 512 /sys/block/mmcblk1/queue/discard_zeroes_data 1 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 7 Mar 2013, Stephen Warren wrote: > Date: Thu, 07 Mar 2013 12:47:09 -0700 > From: Stephen Warren <swarren@wwwdotorg.org> > To: Lukáš Czerner <lczerner@redhat.com> > Cc: Theodore Ts'o <tytso@mit.edu>, Chris Ball <cjb@laptop.org>, > linux-ext4@vger.kernel.org, Stephen Warren <swarren@nvidia.com> > Subject: Re: [PATCH] mke2fs: restore verbose message for BLKDISCARD > > On 03/06/2013 12:13 AM, Lukáš Czerner wrote: > > On Tue, 5 Mar 2013, Stephen Warren wrote: > > > >> Date: Tue, 5 Mar 2013 13:25:18 -0700 > >> From: Stephen Warren <swarren@wwwdotorg.org> > >> To: Theodore Ts'o <tytso@mit.edu> > >> Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org, > >> Stephen Warren <swarren@nvidia.com> > >> Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD > >> > >> From: Stephen Warren <swarren@nvidia.com> > >> > >> mke2fs on a large slow eMMC device may appear to hang while executing > >> ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond > >> extremely slowly. -v doesn't give any hints what's happening. Only strace > >> is a clue. Make -v print some clues to make it easier to track down the > >> apparent hang. > >> > >> This change re-uses the original messages that were implemented as part > >> of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the > >> translations of that message. Note that this patch prints the first > >> message before executing the IOCTL, so the user is told what's going on > >> before the long wait. > >> > >> Signed-off-by: Stephen Warren <swarren@nvidia.com> > > > > Note that you've added the message only around the first 4k discard > > which is only used to test whether the device actually support the > > discard... which is not very useful and it can not even know whether > > the whole device discard will succeed/fail so it is misleading. > > Oops. I certainly should looked at the code better. > > > Moreover there actually is a message saying that we're "Discarding > > device blocks" and it even shows the progress. The step in which > > we're doing the discard (and update the progress) is given by > > DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well. > > That said, it's that very first call to io_channel_discard() that is > hanging then; I don't see the existing message at all. With my change, I > do see the new message that I added. (So, no stdio flushing issue here). It's just one fs block discard, are you sure about that ? > > I will go investigate why this API is hanging. Sorry for the noise. > > ... > > As mentioned we're doing 2GB discard steps so the problem really > > should not be on mkfs side. It is very much possible that the > > discard implementation on the eMMC is terribly slow (which does not > > surprise me). What is says about itself ? > > > > grep . /sys/block/sda/queue/discard_* > > /sys/block/mmcblk1/queue/discard_granularity > 0 > /sys/block/mmcblk1/queue/discard_max_bytes > 512 Here is your problem. Doing discard of 16GB (or even one step of 2GB) on the slow device such the eMMC certainly is just 512B at a time is expected to be *really* slow! If you want to check out how fast/slow it actually is you can issue discard directly on the device using blkdiscard toos which should be part of the very recent util linux (not sure if it has been released yet, but you can try git repository). -Lukas > /sys/block/mmcblk1/queue/discard_zeroes_data > 1 > >
On 03/08/2013 12:23 AM, Lukáš Czerner wrote: > On Thu, 7 Mar 2013, Stephen Warren wrote: > >> Date: Thu, 07 Mar 2013 12:47:09 -0700 >> From: Stephen Warren <swarren@wwwdotorg.org> >> To: Lukáš Czerner <lczerner@redhat.com> >> Cc: Theodore Ts'o <tytso@mit.edu>, Chris Ball <cjb@laptop.org>, >> linux-ext4@vger.kernel.org, Stephen Warren <swarren@nvidia.com> >> Subject: Re: [PATCH] mke2fs: restore verbose message for BLKDISCARD >> >> On 03/06/2013 12:13 AM, Lukáš Czerner wrote: >>> On Tue, 5 Mar 2013, Stephen Warren wrote: >>> >>>> Date: Tue, 5 Mar 2013 13:25:18 -0700 >>>> From: Stephen Warren <swarren@wwwdotorg.org> >>>> To: Theodore Ts'o <tytso@mit.edu> >>>> Cc: Chris Ball <cjb@laptop.org>, linux-ext4@vger.kernel.org, >>>> Stephen Warren <swarren@nvidia.com> >>>> Subject: [PATCH] mke2fs: restore verbose message for BLKDISCARD >>>> >>>> From: Stephen Warren <swarren@nvidia.com> >>>> >>>> mke2fs on a large slow eMMC device may appear to hang while executing >>>> ioctl(BLKDISCARD). CTRL-C and CTRL-\ don't appear to respond, or respond >>>> extremely slowly. -v doesn't give any hints what's happening. Only strace >>>> is a clue. Make -v print some clues to make it easier to track down the >>>> apparent hang. >>>> >>>> This change re-uses the original messages that were implemented as part >>>> of 5827d24 "mke2fs support for BLKDISCARD" in order to easily re-use the >>>> translations of that message. Note that this patch prints the first >>>> message before executing the IOCTL, so the user is told what's going on >>>> before the long wait. >>>> >>>> Signed-off-by: Stephen Warren <swarren@nvidia.com> >>> >>> Note that you've added the message only around the first 4k discard >>> which is only used to test whether the device actually support the >>> discard... which is not very useful and it can not even know whether >>> the whole device discard will succeed/fail so it is misleading. >> >> Oops. I certainly should looked at the code better. >> >>> Moreover there actually is a message saying that we're "Discarding >>> device blocks" and it even shows the progress. The step in which >>> we're doing the discard (and update the progress) is given by >>> DISCARD_STEP_MB = 2048MB. And all that in non verbose mode as well. >> >> That said, it's that very first call to io_channel_discard() that is >> hanging then; I don't see the existing message at all. With my change, I >> do see the new message that I added. (So, no stdio flushing issue here). > > It's just one fs block discard, are you sure about that ? Yes: root@localhost:~# strace -e ioctl ./e2fsprogs/misc/mke2fs -v /dev/mmcblk1p1 mke2fs 1.43-WIP (21-Jan-2013) ioctl(3, BLKGETSIZE64, 0xbe940110) = 0 fs_types for mke2fs.conf resolution: 'ext2' ioctl(3, BLKSSZGET, 0xbe940360) = 0 ioctl(3, BLKPBSZGET, 0xbe940364) = 0 ioctl(3, BLKDISCARDZEROES, 0xbe940324) = 0 ioctl(3, BLKROGET, 0xbe940324) = 0 Calling BLKDISCARD from 0 to 4096 ioctl(3, BLKDISCARD >> I will go investigate why this API is hanging. Sorry for the noise. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I hate to suggest this, but there's so many crappy devices out there that I'm wondering if we need to figure out some way of maintaining a black list of devices that don't handle discard properly. For example, the Sandisk U100 advertises a max discard granularity of 512 bytes, but I've been advised that if you don't use a discard granularity of 256k, aligned on 256k, you'll be very, very, sorry. It sounds like Stephen's device is probably an example of Yet Another Busted Trim implementation. The problem is that manufacturers will be releasing more broken products faster than we can update a blacklist in the kernel. So any blacklist would have to be maintained online on the web, and dynamically updated by distro installers. :-( - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Here's an idea. What if we set a timer when send the first probing discard, and if it takes too long, we print an error message and then try to recover. More generally, if we can't trust a device's advertised discard parameters, maybe it makes sense to create a userspace program that benchmarks the device to determine various critical flash parameters, including "does discard really work?" "Is it hopelessly slow"', and "what is the erase block size"? This program can then send the appropriate parameters to a filesystem's mkfs program. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Fri, Mar 08 2013, Theodore Ts'o wrote: > I hate to suggest this, but there's so many crappy devices out there > that I'm wondering if we need to figure out some way of maintaining a > black list of devices that don't handle discard properly. Yes, we already do that in the MMC layer. See e.g.: http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc.git/commit/?id=3550ccdb9d8d350e526b809bf3dd92b550a74fe1 ("mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.") Thanks, - Chris.
On 03/08/2013 12:08 PM, Chris Ball wrote: > Hi, > > On Fri, Mar 08 2013, Theodore Ts'o wrote: >> I hate to suggest this, but there's so many crappy devices out there >> that I'm wondering if we need to figure out some way of maintaining a >> black list of devices that don't handle discard properly. > > Yes, we already do that in the MMC layer. See e.g.: > > http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc.git/commit/?id=3550ccdb9d8d350e526b809bf3dd92b550a74fe1 > ("mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.") Unrecoverable sounds scary. Does that just mean data corruption, or a permanently broken device? mk2e2fs -K followed by e2fsck and mounting luckily seems to still work OK for me:-) FWIW, my CID is 45010053454d31364790711c665f3e00, i.e. manufacturer ID 0x45, device name SEM16G. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Fri, Mar 08 2013, Stephen Warren wrote: >> On Fri, Mar 08 2013, Theodore Ts'o wrote: >>> I hate to suggest this, but there's so many crappy devices out there >>> that I'm wondering if we need to figure out some way of maintaining a >>> black list of devices that don't handle discard properly. >> >> Yes, we already do that in the MMC layer. See e.g.: >> >> http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc.git/commit/?id=3550ccdb9d8d350e526b809bf3dd92b550a74fe1 >> ("mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.") > > Unrecoverable sounds scary. Does that just mean data corruption, or a > permanently broken device? mk2e2fs -K followed by e2fsck and mounting > luckily seems to still work OK for me:-) Incredibly, it appears to cause a permanently broken device. The URLs in the commit message have some more details. - Chris.
On 3/8/13 1:00 PM, Theodore Ts'o wrote: > I hate to suggest this, but there's so many crappy devices out there > that I'm wondering if we need to figure out some way of maintaining a > black list of devices that don't handle discard properly. For > example, the Sandisk U100 advertises a max discard granularity of 512 > bytes, but I've been advised that if you don't use a discard > granularity of 256k, aligned on 256k, you'll be very, very, sorry. > It sounds like Stephen's device is probably an example of Yet Another > Busted Trim implementation. The problem is that manufacturers will be > releasing more broken products faster than we can update a blacklist > in the kernel. So any blacklist would have to be maintained online on > the web, and dynamically updated by distro installers. :-( Yeah I think it may be time for a blacklist. TBH I think we should revisit discard-at-mkfs-time by default as well. It seemed like a decent idea at the time, but now we have handy fstrim as well as online/dynamic/realtime trim, and trim at mkfs makes mke2fs completely un-doable in the event of fat fingers. -Eric > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 11 Mar 2013, Eric Sandeen wrote: > Date: Mon, 11 Mar 2013 09:08:20 -0500 > From: Eric Sandeen <sandeen@redhat.com> > To: Theodore Ts'o <tytso@mit.edu> > Cc: Stephen Warren <swarren@wwwdotorg.org>, > Lukáš Czerner <lczerner@redhat.com>, Chris Ball <cjb@laptop.org>, > linux-ext4@vger.kernel.org, Stephen Warren <swarren@nvidia.com> > Subject: Re: [PATCH] mke2fs: restore verbose message for BLKDISCARD > > On 3/8/13 1:00 PM, Theodore Ts'o wrote: > > I hate to suggest this, but there's so many crappy devices out there > > that I'm wondering if we need to figure out some way of maintaining a > > black list of devices that don't handle discard properly. For > > example, the Sandisk U100 advertises a max discard granularity of 512 > > bytes, but I've been advised that if you don't use a discard > > granularity of 256k, aligned on 256k, you'll be very, very, sorry. > > > It sounds like Stephen's device is probably an example of Yet Another > > Busted Trim implementation. The problem is that manufacturers will be > > releasing more broken products faster than we can update a blacklist > > in the kernel. So any blacklist would have to be maintained online on > > the web, and dynamically updated by distro installers. :-( > > Yeah I think it may be time for a blacklist. > > TBH I think we should revisit discard-at-mkfs-time by default as well. > > It seemed like a decent idea at the time, but now we have handy > fstrim as well as online/dynamic/realtime trim, and trim at mkfs > makes mke2fs completely un-doable in the event of fat fingers. However that will make things even worse when someone uses it on such device, than simply realising the problem on mkfs time. Sure, there are buggy or crappy devices out there, but that's not we want to optimize for right ? Regarding the 'fat fingers' point I am starting to thing that refusing to create the file system if we found any old signature (like xfs does) would really be a good thing to have. In order not to break scripts we can fall back to the old behaviour if we find out that there is not any interactive input attached (we're running from the script). -Lukas > > -Eric > > > > > > - Ted > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > >
diff --git a/misc/mke2fs.c b/misc/mke2fs.c index bbf477a..defb932 100644 --- a/misc/mke2fs.c +++ b/misc/mke2fs.c @@ -2272,7 +2272,20 @@ static int mke2fs_discard_device(ext2_filsys fs) * we do not print numeric progress resulting in failure * afterwards. */ + if (verbose) { + printf(_("Calling BLKDISCARD from %llu to %llu "), + (unsigned long long)0, + (unsigned long long)fs->blocksize); + fflush(stdout); + } retval = io_channel_discard(fs->io, 0, fs->blocksize); + if (verbose) { + if (retval) + printf(_("failed.\n")); + else + printf(_("succeeded.\n")); + } + if (retval) return retval; cur = fs->blocksize;