Patchwork [2/2,v2] debugfs: dump a sparse file as a new sparse file

login
register
mail settings
Submitter Zheng Liu
Date Jan. 1, 2013, 12:30 p.m.
Message ID <1357043415-24668-3-git-send-email-wenqing.lz@taobao.com>
Download mbox | patch
Permalink /patch/208904/
State Superseded
Headers show

Comments

Zheng Liu - Jan. 1, 2013, 12:30 p.m.
From: Zheng Liu <wenqing.lz@taobao.com>

For dumping a sparse file, ext2fs_file_read2 is defined to expand the interface.
It returns the sizeof hole and we can call lseek64(2) to skip it.

CC: George Spelvin <linux@horizon.com>
CC: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
---
 debugfs/dump.c      | 14 +++++++++++++-
 lib/ext2fs/ext2fs.h |  3 +++
 lib/ext2fs/fileio.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 55 insertions(+), 12 deletions(-)
Theodore Ts'o - Jan. 1, 2013, 8:38 p.m.
On Tue, Jan 01, 2013 at 08:30:15PM +0800, Zheng Liu wrote:
> +errcode_t ext2fs_file_read2(ext2_file_t file, void *buf,
> +			    unsigned int wanted, unsigned int *got,
> +			    ext2_off64_t *seek)

I'm a bit concenred about this abstraction.  Consider what happens if
wanted is greater than a block size --- for example, consider if
wanted is 16k, and every other 1k block is uninitialized.

Then ext2fs_file_read2() will return *got set to 8k, and *seek set to
8k, and the buffer will contain the blocks that are initialized packed
up right against each other.

Worse, ext2fs_file_read() will do the same thing, so this commit
changes how ext2fs_file_read() functions, and a program which expects
to get the correct contents from the file will malfunction.

       	   	   	    	     	       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu - Jan. 4, 2013, 4:05 a.m.
On Tue, Jan 01, 2013 at 03:38:58PM -0500, Theodore Ts'o wrote:
> On Tue, Jan 01, 2013 at 08:30:15PM +0800, Zheng Liu wrote:
> > +errcode_t ext2fs_file_read2(ext2_file_t file, void *buf,
> > +			    unsigned int wanted, unsigned int *got,
> > +			    ext2_off64_t *seek)
> 
> I'm a bit concenred about this abstraction.  Consider what happens if
> wanted is greater than a block size --- for example, consider if
> wanted is 16k, and every other 1k block is uninitialized.

Hi Ted, 

I wonder why wanted is 16k.  If a program calls ext2fs_file_read()
function, seek will be 0 and SEEK flag won't be marked.  The behavior of
ext2fs_file_read() is the same as before.  If ext2fs_file_read2() is
called by dump_file(), seek won't be 0 and wanted is always equal to
block size.  That is why I fix the hard-coded buffer length in dump_file().
If I miss something, please let me know.

Thanks,
                                                - Zheng

> 
> Then ext2fs_file_read2() will return *got set to 8k, and *seek set to
> 8k, and the buffer will contain the blocks that are initialized packed
> up right against each other.
> 
> Worse, ext2fs_file_read() will do the same thing, so this commit
> changes how ext2fs_file_read() functions, and a program which expects
> to get the correct contents from the file will malfunction.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Jan. 4, 2013, 7:37 p.m.
On Fri, Jan 04, 2013 at 12:05:05PM +0800, Zheng Liu wrote:
> >
> > I'm a bit concenred about this abstraction.  Consider what happens if
> > wanted is greater than a block size --- for example, consider if
> > wanted is 16k, and every other 1k block is uninitialized.
>
> Hi Ted,
>
> I wonder why wanted is 16k.  If a program calls ext2fs_file_read()
> function, seek will be 0 and SEEK flag won't be marked.  The behavior of
> ext2fs_file_read() is the same as before.  If ext2fs_file_read2() is
> called by dump_file(), seek won't be 0 and wanted is always equal to
> block size.  That is why I fix the hard-coded buffer length in dump_file().
> If I miss something, please let me know.

The problem is that ext2fs_file_read() is an exported function, and
there are users of this API/ABI outside of e2fsprogs.

The goal of this function is that it should look like the read system
call, and the caller might not know what the blocksize might be.  So
if the caller uses a 4k fixed size buffer, and the underlying file
system blocksize is 1k, this function needs to work properly.

So consider what happens if some program, perhaps an ext[234] FUSE
driver (there are two or three of them out there), or the e2tools
package, uses a 4k or 16k buffer --- this is legal, and they call the
existing ext2fs_file_read() library function.  In your patch,
ext2fs_file_read() will call ext2fs_file_read2(), and it will skip the
sparse blocks, and since the returned seek pointer is null, there's no
possible way for the caller of the ext2fs_file_read() would know this
had happened --- and even if there was a way, we don't ever change the
semantics/behaviour of an existing functional interface unless it's a
clear bug (and even then we need to think very carefully about the
backwards compatibility implications).

Regards,

                                                - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu - Jan. 5, 2013, 4:45 a.m.
Hi Ted,

Sorry, I am still confused.  Maybe I misunderstand something.  Please
bear with me.

On Fri, Jan 04, 2013 at 02:37:09PM -0500, Theodore Ts'o wrote:
> On Fri, Jan 04, 2013 at 12:05:05PM +0800, Zheng Liu wrote:
> > >
> > > I'm a bit concenred about this abstraction.  Consider what happens if
> > > wanted is greater than a block size --- for example, consider if
> > > wanted is 16k, and every other 1k block is uninitialized.
> >
> > Hi Ted,
> >
> > I wonder why wanted is 16k.  If a program calls ext2fs_file_read()
> > function, seek will be 0 and SEEK flag won't be marked.  The behavior of
> > ext2fs_file_read() is the same as before.  If ext2fs_file_read2() is
> > called by dump_file(), seek won't be 0 and wanted is always equal to
> > block size.  That is why I fix the hard-coded buffer length in dump_file().
> > If I miss something, please let me know.
> 
> The problem is that ext2fs_file_read() is an exported function, and
> there are users of this API/ABI outside of e2fsprogs.
> 
> The goal of this function is that it should look like the read system
> call, and the caller might not know what the blocksize might be.  So
> if the caller uses a 4k fixed size buffer, and the underlying file
> system blocksize is 1k, this function needs to work properly.

Yeah, Caller can use a fixed size buffer and needn't to care about the
underlying file system blocksize.

> 
> So consider what happens if some program, perhaps an ext[234] FUSE
> driver (there are two or three of them out there), or the e2tools
> package, uses a 4k or 16k buffer --- this is legal, and they call the
> existing ext2fs_file_read() library function.  In your patch,
> ext2fs_file_read() will call ext2fs_file_read2(), and it will skip the
> sparse blocks, and since the returned seek pointer is null, there's no
> possible way for the caller of the ext2fs_file_read() would know this
> had happened --- and even if there was a way, we don't ever change the
> semantics/behaviour of an existing functional interface unless it's a
> clear bug (and even then we need to think very carefully about the
> backwards compatibility implications).

Yes, some programs call ext2fs_file_read() with a 4k or 16k fixed size
buffer, and ext2fs_file_read() calls ext2fs_file_read2().  But it won't
skip the sparse blocks because when ext2fs_file_read2() is called in
ext2fs_file_read(), the last argument, namely 'seek', is 0.  That means
that in ext2fs_file_read2() 'flags' is 0.  Thus, in load_buffer()
'flags' is not equal to SEEK, and EXT2_FILE_BUF_VALID is marked.  Then
we return back to ext2fs_file_read2() and all data in file->buf is
copied.  So I think the behavior of ext2fs_file_read() doesn't be
changed.

On the other hand, if ext2fs_file_read2() is called by dump_file()
directly.  'seek' is not 0 and 'wanted' is equal to blocksize.  In
ext2fs_file_read2() 'flags' is assigned to SEEK and in load_buffer()
EXT2_FILE_BUF_VALID is not marked if it meets an uninitialized extent.

Am I missing something?  Thanks for your time.

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Jan. 9, 2013, 2:58 p.m.
On Sat, Jan 05, 2013 at 12:45:22PM +0800, Zheng Liu wrote:
> Yes, some programs call ext2fs_file_read() with a 4k or 16k fixed size
> buffer, and ext2fs_file_read() calls ext2fs_file_read2().  But it won't
> skip the sparse blocks because when ext2fs_file_read2() is called in
> ext2fs_file_read(), the last argument, namely 'seek', is 0.  That means
> that in ext2fs_file_read2() 'flags' is 0.  Thus, in load_buffer()
> 'flags' is not equal to SEEK, and EXT2_FILE_BUF_VALID is marked.  Then
> we return back to ext2fs_file_read2() and all data in file->buf is
> copied.  So I think the behavior of ext2fs_file_read() doesn't be
> changed.

You're right; I had forgotten about that part of the change.

I still am a bit concerned about the interface, because if you specify
a pointer to seek in ext2fs_file_read2(), you have to know what the
file system blocksize is, because if you give a count which is larger
than a single block, the value of the returned seek and the data which
is returned in the buffer is impossible to interpret (consider a file
where every other 1k block is sparse, and you try to read into a 4k
buffer).

So what I would suggest is the following as a better, more efficient
interface.

1) Add a new flag which can be passed into ext2_file_open() which
requests sparse-intelligent handling.

2) If the sparse flag is set, then ext2_file_read() will stop the read
when it runs into the first uninitialized or sparse block.  That is,
consider the example file which has 8k of data, a 4k uninitialized
block, and then 12k of data after that.  If the sparse flag is passed
to ext2_file_open(), then ext2fs_file_read(fd, buf, 16384, &got) will
read 8k of data into buf, and return with got set to 8192.

3) To distinguish between EOF and a sparse block, if the current file
offset is pointed at a sparse/uninitialized block, and the sparse flag
was passed to ext2_file_open(), then in addition to *got getting set
0, ext2_file_read() will also return a new error code,
EXT2_ET_READ_HOLE_FOUND.

4) We also extend ext2_file_llseek() to also support EXT2_SEEK_HOLE
and EXT2_SEEK_DATA, which works like SEEK_HOLE and SEEK_DATA flags to
llseek().  This will allow the caller to efficiently find the next
part of the file with valid data.

What I like about this interface is that we don't need to define a new
ext2_file_read2(), and it is also more efficient for an application
which is interested in reading multiple blocks at a time.

What do you think?

						- Ted


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu - Jan. 9, 2013, 3:34 p.m.
On Wed, Jan 09, 2013 at 09:58:54AM -0500, Theodore Ts'o wrote:
> On Sat, Jan 05, 2013 at 12:45:22PM +0800, Zheng Liu wrote:
> > Yes, some programs call ext2fs_file_read() with a 4k or 16k fixed size
> > buffer, and ext2fs_file_read() calls ext2fs_file_read2().  But it won't
> > skip the sparse blocks because when ext2fs_file_read2() is called in
> > ext2fs_file_read(), the last argument, namely 'seek', is 0.  That means
> > that in ext2fs_file_read2() 'flags' is 0.  Thus, in load_buffer()
> > 'flags' is not equal to SEEK, and EXT2_FILE_BUF_VALID is marked.  Then
> > we return back to ext2fs_file_read2() and all data in file->buf is
> > copied.  So I think the behavior of ext2fs_file_read() doesn't be
> > changed.
> 
> You're right; I had forgotten about that part of the change.
> 
> I still am a bit concerned about the interface, because if you specify
> a pointer to seek in ext2fs_file_read2(), you have to know what the
> file system blocksize is, because if you give a count which is larger
> than a single block, the value of the returned seek and the data which
> is returned in the buffer is impossible to interpret (consider a file
> where every other 1k block is sparse, and you try to read into a 4k
> buffer).
> 
> So what I would suggest is the following as a better, more efficient
> interface.
> 
> 1) Add a new flag which can be passed into ext2_file_open() which
> requests sparse-intelligent handling.
> 
> 2) If the sparse flag is set, then ext2_file_read() will stop the read
> when it runs into the first uninitialized or sparse block.  That is,
> consider the example file which has 8k of data, a 4k uninitialized
> block, and then 12k of data after that.  If the sparse flag is passed
> to ext2_file_open(), then ext2fs_file_read(fd, buf, 16384, &got) will
> read 8k of data into buf, and return with got set to 8192.
> 
> 3) To distinguish between EOF and a sparse block, if the current file
> offset is pointed at a sparse/uninitialized block, and the sparse flag
> was passed to ext2_file_open(), then in addition to *got getting set
> 0, ext2_file_read() will also return a new error code,
> EXT2_ET_READ_HOLE_FOUND.
> 
> 4) We also extend ext2_file_llseek() to also support EXT2_SEEK_HOLE
> and EXT2_SEEK_DATA, which works like SEEK_HOLE and SEEK_DATA flags to
> llseek().  This will allow the caller to efficiently find the next
> part of the file with valid data.
> 
> What I like about this interface is that we don't need to define a new
> ext2_file_read2(), and it is also more efficient for an application
> which is interested in reading multiple blocks at a time.
> 
> What do you think?

Thanks so much for your advices.  I will try to generate the latest
patches.

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Jan. 10, 2013, 12:42 a.m.
> 2) If the sparse flag is set, then ext2_file_read() will stop the read
> when it runs into the first uninitialized or sparse block.  That is,
> consider the example file which has 8k of data, a 4k uninitialized
> block, and then 12k of data after that.  If the sparse flag is passed
> to ext2_file_open(), then ext2fs_file_read(fd, buf, 16384, &got) will
> read 8k of data into buf, and return with got set to 8192.
> 
> 3) To distinguish between EOF and a sparse block, if the current file
> offset is pointed at a sparse/uninitialized block, and the sparse flag
> was passed to ext2_file_open(), then in addition to *got getting set
> 0, ext2_file_read() will also return a new error code,
> EXT2_ET_READ_HOLE_FOUND.

Given that the current model of ext2fs_file_read is that it returns
some valid data in *got, AND the reason it stopped short as the retval,
wouldn't it make more sense to return EXT2_ET_READ_HOLE_FOUND from the
*first* read call?

It's a minor thing, as you just end up falling back to the Unix syscall
model of deferring the error until the next read call, but wouldn't
it be more consistent?
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/debugfs/dump.c b/debugfs/dump.c
index 5b2289c..d4483dc 100644
--- a/debugfs/dump.c
+++ b/debugfs/dump.c
@@ -107,6 +107,7 @@  static void dump_file(const char *cmdname, ext2_ino_t ino, int fd,
 	struct ext2_inode	inode;
 	char		*buf = 0;
 	ext2_file_t	e2_file;
+	ext2_off64_t	seek;
 	int		nbytes;
 	unsigned int	got, blocksize = current_fs->blocksize;
 
@@ -124,11 +125,22 @@  static void dump_file(const char *cmdname, ext2_ino_t ino, int fd,
 		return;
 	}
 	while (1) {
-		retval = ext2fs_file_read(e2_file, buf, blocksize, &got);
+		/* If fd is stdout, we won't consider a sparse file. */
+		if (fd == 1)
+			retval = ext2fs_file_read(e2_file, buf, blocksize,
+						  &got);
+		else
+			retval = ext2fs_file_read2(e2_file, buf, blocksize,
+						   &got, &seek);
 		if (retval)
 			com_err(cmdname, retval, "while reading ext2 file");
 		if (got == 0)
 			break;
+		if (fd != 1) {
+			nbytes = lseek64(fd, blocksize * seek, SEEK_CUR);
+			if (nbytes < 0)
+				com_err(cmdname, errno, "while lseek file");
+		}
 		nbytes = write(fd, buf, got);
 		if ((unsigned) nbytes != got)
 			com_err(cmdname, errno, "while writing file");
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 7ec189e..435b68c 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1171,6 +1171,9 @@  extern errcode_t ext2fs_file_close(ext2_file_t file);
 extern errcode_t ext2fs_file_flush(ext2_file_t file);
 extern errcode_t ext2fs_file_read(ext2_file_t file, void *buf,
 				  unsigned int wanted, unsigned int *got);
+extern errcode_t ext2fs_file_read2(ext2_file_t file, void *buf,
+				   unsigned int wanted, unsigned int *got,
+				   ext2_off64_t *seek);
 extern errcode_t ext2fs_file_write(ext2_file_t file, const void *buf,
 				   unsigned int nbytes, unsigned int *written);
 extern errcode_t ext2fs_file_llseek(ext2_file_t file, __u64 offset,
diff --git a/lib/ext2fs/fileio.c b/lib/ext2fs/fileio.c
index 1f7002c..508d15c 100644
--- a/lib/ext2fs/fileio.c
+++ b/lib/ext2fs/fileio.c
@@ -179,20 +179,27 @@  static errcode_t sync_buffer_position(ext2_file_t file)
  * If dontfill is true, then skip initializing the buffer since we're
  * going to be replacing its entire contents anyway.  If set, then the
  * function basically only sets file->physblock and EXT2_FILE_BUF_VALID
+ *
+ * If seek is true, it indicates that we need to skip unwritten extents.
+ * When we get an unwritten extent, EXT2_FILE_BUF_VALID won't be set to
+ * tell caller that here is an unwritten extent and its content needn't
+ * be copied.
  */
 #define DONTFILL 1
-static errcode_t load_buffer(ext2_file_t file, int dontfill)
+#define SEEK	 2
+static errcode_t load_buffer(ext2_file_t file, int flags)
 {
 	ext2_filsys	fs = file->fs;
 	errcode_t	retval;
+	int		ret_flags;
 
 	if (!(file->flags & EXT2_FILE_BUF_VALID)) {
 		retval = ext2fs_bmap2(fs, file->ino, &file->inode,
-				     BMAP_BUFFER, 0, file->blockno, 0,
-				     &file->physblock);
+				     BMAP_BUFFER, 0, file->blockno,
+				     &ret_flags, &file->physblock);
 		if (retval)
 			return retval;
-		if (!dontfill) {
+		if (flags != DONTFILL) {
 			if (file->physblock) {
 				retval = io_channel_read_blk(fs->io,
 							     file->physblock,
@@ -202,7 +209,9 @@  static errcode_t load_buffer(ext2_file_t file, int dontfill)
 			} else
 				memset(file->buf, 0, fs->blocksize);
 		}
-		file->flags |= EXT2_FILE_BUF_VALID;
+		if ((flags != SEEK) ||
+		    (!(ret_flags & BMAP_RET_UNINIT) && file->physblock))
+			file->flags |= EXT2_FILE_BUF_VALID;
 	}
 	return 0;
 }
@@ -227,20 +236,33 @@  errcode_t ext2fs_file_close(ext2_file_t file)
 errcode_t ext2fs_file_read(ext2_file_t file, void *buf,
 			   unsigned int wanted, unsigned int *got)
 {
+	return ext2fs_file_read2(file, buf, wanted, got, 0);
+}
+
+
+errcode_t ext2fs_file_read2(ext2_file_t file, void *buf,
+			    unsigned int wanted, unsigned int *got,
+			    ext2_off64_t *seek)
+{
 	ext2_filsys	fs;
 	errcode_t	retval = 0;
 	unsigned int	start, c, count = 0;
 	__u64		left;
 	char		*ptr = (char *) buf;
+	int		seek_cnt = 0;
+	int		flags = 0;
 
 	EXT2_CHECK_MAGIC(file, EXT2_ET_MAGIC_EXT2_FILE);
 	fs = file->fs;
 
+	if (seek)
+		flags = SEEK;
+
 	while ((file->pos < EXT2_I_SIZE(&file->inode)) && (wanted > 0)) {
 		retval = sync_buffer_position(file);
 		if (retval)
 			goto fail;
-		retval = load_buffer(file, 0);
+		retval = load_buffer(file, flags);
 		if (retval)
 			goto fail;
 
@@ -248,20 +270,26 @@  errcode_t ext2fs_file_read(ext2_file_t file, void *buf,
 		c = fs->blocksize - start;
 		if (c > wanted)
 			c = wanted;
-		left = EXT2_I_SIZE(&file->inode) - file->pos ;
+		left = EXT2_I_SIZE(&file->inode) - file->pos;
 		if (c > left)
 			c = left;
 
-		memcpy(ptr, file->buf+start, c);
 		file->pos += c;
-		ptr += c;
-		count += c;
-		wanted -= c;
+		if (file->flags & EXT2_FILE_BUF_VALID) {
+			memcpy(ptr, file->buf+start, c);
+			ptr += c;
+			count += c;
+			wanted -= c;
+		} else {
+			seek_cnt++;
+		}
 	}
 
 fail:
 	if (got)
 		*got = count;
+	if (seek)
+		*seek = seek_cnt;
 	return retval;
 }