diff mbox

[5,2/4] Return 32/64-bit dir name hash according to usage type

Message ID 20120109132148.2616029.68798.stgit@localhost.localdomain
State Superseded, archived
Headers show

Commit Message

Bernd Schubert Jan. 9, 2012, 1:21 p.m. UTC
From: Fan Yong <yong.fan@whamcloud.com>

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir().  However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.  This still needs
integration on the NFS side.

Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
(blame me if something is not correct)

Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
---
 fs/ext4/dir.c  |  185 ++++++++++++++++++++++++++++++++++++++++++++------------
 fs/ext4/ext4.h |    6 ++
 fs/ext4/hash.c |    4 +
 3 files changed, 154 insertions(+), 41 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Theodore Ts'o March 5, 2012, 3:59 p.m. UTC | #1
On Mon, Jan 09, 2012 at 02:21:48PM +0100, Bernd Schubert wrote:
> diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
> index ac8f168..fa8e491 100644
> --- a/fs/ext4/hash.c
> +++ b/fs/ext4/hash.c
> @@ -200,8 +200,8 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
>  		return -1;
>  	}
>  	hash = hash & ~1;
> -	if (hash == (EXT4_HTREE_EOF << 1))
> -		hash = (EXT4_HTREE_EOF-1) << 1;
> +	if (hash == (EXT4_HTREE_EOF_32BIT << 1))
> +		hash = (EXT4_HTREE_EOF_32BIT - 1) << 1;
>  	hinfo->hash = hash;
>  	hinfo->minor_hash = minor_hash;
>  	return 0;

Is there a reason why we don't need to avoid the collsion with the
64-bit EOF value as well?  i.e., I think we also need to add:

	if (hash == (EXT4_HTREE_EOF_64BIT << 1))
		hash = (EXT4_HTREE_EOF_64BIT - 1) << 1;

		       			       	  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bernd Schubert March 6, 2012, 12:40 a.m. UTC | #2
On 03/05/2012 04:59 PM, Ted Ts'o wrote:
> On Mon, Jan 09, 2012 at 02:21:48PM +0100, Bernd Schubert wrote:
>> diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
>> index ac8f168..fa8e491 100644
>> --- a/fs/ext4/hash.c
>> +++ b/fs/ext4/hash.c
>> @@ -200,8 +200,8 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
>>   		return -1;
>>   	}
>>   	hash = hash&  ~1;
>> -	if (hash == (EXT4_HTREE_EOF<<  1))
>> -		hash = (EXT4_HTREE_EOF-1)<<  1;
>> +	if (hash == (EXT4_HTREE_EOF_32BIT<<  1))
>> +		hash = (EXT4_HTREE_EOF_32BIT - 1)<<  1;
>>   	hinfo->hash = hash;
>>   	hinfo->minor_hash = minor_hash;
>>   	return 0;
>
> Is there a reason why we don't need to avoid the collsion with the
> 64-bit EOF value as well?  i.e., I think we also need to add:
>
> 	if (hash == (EXT4_HTREE_EOF_64BIT<<  1))
> 		hash = (EXT4_HTREE_EOF_64BIT - 1)<<  1;
>
> 		       			       	  - Ted


Thanks for looking into it, really appreciated!

Yeah, you are right, we also should check for 64-bit EOF. But wouldn't 
be something like this be better?

	/* check for hash collision */
	if(is_32bit_api() ) {
		if (hash == (EXT4_HTREE_EOF_32BIT<<  1))
			hash = (EXT4_HTREE_EOF_32BIT - 1)<<  1;
	} else {
		if (hash == (EXT4_HTREE_EOF_64BIT<<  1))
			hash = (EXT4_HTREE_EOF_64BIT - 1)<<  1;
	}



Or am I over engineering?


Thanks,
Bernd


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 6, 2012, 2:28 a.m. UTC | #3
On Tue, Mar 06, 2012 at 01:40:05AM +0100, Bernd Schubert wrote:
> 
> Yeah, you are right, we also should check for 64-bit EOF. But
> wouldn't be something like this be better?
> 
> 	/* check for hash collision */
> 	if(is_32bit_api() ) {
> 		if (hash == (EXT4_HTREE_EOF_32BIT<<  1))
> 			hash = (EXT4_HTREE_EOF_32BIT - 1)<<  1;
> 	} else {
> 		if (hash == (EXT4_HTREE_EOF_64BIT<<  1))
> 			hash = (EXT4_HTREE_EOF_64BIT - 1)<<  1;
> 	}

Actually, neither change is needed, now that I look at things more
closely.  hash is a __u32, so it could never been
EXT4_HTREE_EOF_64BIT.  But given that we won't let major hash become
larger than 0xfffffffc, that means the largest possible position value
is 0x7ffffffeffffffff.  So using an EOF value of 0x0x7fffffffffffffff
will work fine.

The bigger problem that I found when I looked more closely at the
patch is that the patch uses f_flags in places where f_mode needs to
be used:

static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
{
	if ((filp->f_flags & FMODE_32BITHASH) ||
                   ^^^^^^^
	    (!(filp->f_flags & FMODE_64BITHASH) && is_32bit_api()))
                     ^^^^^^^
		return major >> 1;
	else
		return ((__u64)(major >> 1) << 32) | (__u64)minor;
}

static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
{
	if ((filp->f_flags & FMODE_32BITHASH) ||
                   ^^^^^^
	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
                     ^^^^^^
		return (pos << 1) & 0xffffffff;
	else
		return ((pos >> 32) << 1) & 0xffffffff;
}

Which makes me wonder how much this has been tested?

      	       	      	       	    	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bernd Schubert March 6, 2012, 9:59 a.m. UTC | #4
On 03/06/2012 03:28 AM, Ted Ts'o wrote:
> On Tue, Mar 06, 2012 at 01:40:05AM +0100, Bernd Schubert wrote:
>>
>> Yeah, you are right, we also should check for 64-bit EOF. But
>> wouldn't be something like this be better?
>>
>> 	/* check for hash collision */
>> 	if(is_32bit_api() ) {
>> 		if (hash == (EXT4_HTREE_EOF_32BIT<<   1))
>> 			hash = (EXT4_HTREE_EOF_32BIT - 1)<<   1;
>> 	} else {
>> 		if (hash == (EXT4_HTREE_EOF_64BIT<<   1))
>> 			hash = (EXT4_HTREE_EOF_64BIT - 1)<<   1;
>> 	}
>
> Actually, neither change is needed, now that I look at things more
> closely.  hash is a __u32, so it could never been
> EXT4_HTREE_EOF_64BIT.  But given that we won't let major hash become
> larger than 0xfffffffc, that means the largest possible position value
> is 0x7ffffffeffffffff.  So using an EOF value of 0x0x7fffffffffffffff
> will work fine.

Ah, I looked after 1 a.m., seems that was too late for me to notice.

>
> The bigger problem that I found when I looked more closely at the
> patch is that the patch uses f_flags in places where f_mode needs to
> be used:
>
> static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
> {
> 	if ((filp->f_flags&  FMODE_32BITHASH) ||
>                     ^^^^^^^
> 	    (!(filp->f_flags&  FMODE_64BITHASH)&&  is_32bit_api()))
>                       ^^^^^^^
> 		return major>>  1;
> 	else
> 		return ((__u64)(major>>  1)<<  32) | (__u64)minor;
> }
>
> static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
> {
> 	if ((filp->f_flags&  FMODE_32BITHASH) ||
>                     ^^^^^^
> 	    (!(filp->f_mode&  FMODE_64BITHASH)&&  is_32bit_api()))
>                       ^^^^^^
> 		return (pos<<  1)&  0xffffffff;
> 	else
> 		return ((pos>>  32)<<  1)&  0xffffffff;
> }
>
> Which makes me wonder how much this has been tested?

Arg, my bad, I introduced this issue when I converted from f_flags to 
f_mode, seems I forgot all of those above :(
Hrm, I thought I had tested sufficiently, but obviously I did not :( 
Here's the test tool.
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/test_seekdir/

While quickly looking, I think it only affects NFSv2, which I think I 
indeed didn't test. I only run tests for 32 bit and 64-bit user space 
and NFSv3. But yes, NFSv2 is an important test too. Not sure if I will 
find time for that today.

Will send an updated version later on.


Thanks for your review,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 6, 2012, 3:15 p.m. UTC | #5
On Tue, Mar 06, 2012 at 10:59:55AM +0100, Bernd Schubert wrote:
> Arg, my bad, I introduced this issue when I converted from f_flags
> to f_mode, seems I forgot all of those above :(
> Hrm, I thought I had tested sufficiently, but obviously I did not :(
> Here's the test tool.
> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/test_seekdir/
> 
> While quickly looking, I think it only affects NFSv2, which I think
> I indeed didn't test. I only run tests for 32 bit and 64-bit user
> space and NFSv3. But yes, NFSv2 is an important test too. Not sure
> if I will find time for that today.

I think the problem case to worry about would have been on a 32-bit
NFS server (think embedded/bookshelf NAS servers) and NFSv3, since
that's where the inconsistency would have meant that hash2pos would
use the 32-bit codepath, but pos2maj_hash would use the 64-bit
codepath, and so then...  kablooey.

> Will send an updated version later on.

If it's just those changes, no worries, as I've already fixed up the
f_flags vs. f_mode issue in my copy of the patches.

Let me know if there are any other changes that you'd like to make or
issues that you've spotted.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bernd Schubert March 7, 2012, 9:01 a.m. UTC | #6
On 03/06/2012 04:15 PM, Ted Ts'o wrote:
> On Tue, Mar 06, 2012 at 10:59:55AM +0100, Bernd Schubert wrote:
>> Arg, my bad, I introduced this issue when I converted from f_flags
>> to f_mode, seems I forgot all of those above :(
>> Hrm, I thought I had tested sufficiently, but obviously I did not :(
>> Here's the test tool.
>> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/test_seekdir/
>>
>> While quickly looking, I think it only affects NFSv2, which I think
>> I indeed didn't test. I only run tests for 32 bit and 64-bit user
>> space and NFSv3. But yes, NFSv2 is an important test too. Not sure
>> if I will find time for that today.
>
> I think the problem case to worry about would have been on a 32-bit
> NFS server (think embedded/bookshelf NAS servers) and NFSv3, since
> that's where the inconsistency would have meant that hash2pos would
> use the 32-bit codepath, but pos2maj_hash would use the 64-bit
> codepath, and so then...  kablooey.
>
>> Will send an updated version later on.
>
> If it's just those changes, no worries, as I've already fixed up the
> f_flags vs. f_mode issue in my copy of the patches.
>
> Let me know if there are any other changes that you'd like to make or
> issues that you've spotted.

Ted, thanks a bunch! I really will try to be more careful updating 
patches next time. And during the next days I'm also going to review the 
patches again, at least I hope I find the time for that while being on 
vacation).


Thanks again,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen April 20, 2012, 8:04 p.m. UTC | #7
On 1/9/12 7:21 AM, Bernd Schubert wrote:
> From: Fan Yong <yong.fan@whamcloud.com>
> 
> Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
> to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
> and telldir().  However, this causes problems if there are 32-bit hash
> collisions, since the NFSv2 server can get stuck resending the same
> entries from the directory repeatedly.
> 
> Allow ext4 to return a full 64-bit hash (both major and minor) for
> telldir to decrease the chance of hash collisions.  This still needs
> integration on the NFS side.
> 
> Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
> (blame me if something is not correct)

Bernd, I've merged this to ext3.  Bruce thought maybe you were working
on the same.  Should I send mine?

Also...

> +/*
> + * ext4_dir_llseek() based on generic_file_llseek() to handle both
> + * non-htree and htree directories, where the "offset" is in terms
> + * of the filename hash value instead of the byte offset.
> + *
> + * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
> + *       will be invalid once the directory was converted into a dx directory
> + */
> +loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)

ext4_llseek() worries about max offset for direct/indirect vs. extent-mapped
files.  Do we need to worry about the same thing in this function?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bernd Schubert April 22, 2012, 12:51 p.m. UTC | #8
On 04/20/2012 10:04 PM, Eric Sandeen wrote:
> On 1/9/12 7:21 AM, Bernd Schubert wrote:
>> From: Fan Yong <yong.fan@whamcloud.com>
>>
>> Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
>> to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
>> and telldir().  However, this causes problems if there are 32-bit hash
>> collisions, since the NFSv2 server can get stuck resending the same
>> entries from the directory repeatedly.
>>
>> Allow ext4 to return a full 64-bit hash (both major and minor) for
>> telldir to decrease the chance of hash collisions.  This still needs
>> integration on the NFS side.
>>
>> Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
>> (blame me if something is not correct)
> 
> Bernd, I've merged this to ext3.  Bruce thought maybe you were working
> on the same.  Should I send mine?

That is perfectly fine with me.

> 
> Also...
> 
>> +/*
>> + * ext4_dir_llseek() based on generic_file_llseek() to handle both
>> + * non-htree and htree directories, where the "offset" is in terms
>> + * of the filename hash value instead of the byte offset.
>> + *
>> + * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
>> + *       will be invalid once the directory was converted into a dx directory
>> + */
>> +loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
> 
> ext4_llseek() worries about max offset for direct/indirect vs. extent-mapped
> files.  Do we need to worry about the same thing in this function?

Hrmm, I just checked it and I think either is wrong. We only have to
care about non-dx directories, so ext4_readdir() applies, which limits
filp->f_pos < inode->i_size.
Going to send a patch tomorrow. Thanks for spotting this!


Cheers,
Bernd



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen April 23, 2012, 8:37 p.m. UTC | #9
On 4/22/12 7:51 AM, Bernd Schubert wrote:
> On 04/20/2012 10:04 PM, Eric Sandeen wrote:
>> On 1/9/12 7:21 AM, Bernd Schubert wrote:
>>> From: Fan Yong <yong.fan@whamcloud.com>
>>>
>>> Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
>>> to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
>>> and telldir().  However, this causes problems if there are 32-bit hash
>>> collisions, since the NFSv2 server can get stuck resending the same
>>> entries from the directory repeatedly.
>>>
>>> Allow ext4 to return a full 64-bit hash (both major and minor) for
>>> telldir to decrease the chance of hash collisions.  This still needs
>>> integration on the NFS side.
>>>
>>> Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
>>> (blame me if something is not correct)
>>
>> Bernd, I've merged this to ext3.  Bruce thought maybe you were working
>> on the same.  Should I send mine?
> 
> That is perfectly fine with me.
> 
>>
>> Also...
>>
>>> +/*
>>> + * ext4_dir_llseek() based on generic_file_llseek() to handle both
>>> + * non-htree and htree directories, where the "offset" is in terms
>>> + * of the filename hash value instead of the byte offset.
>>> + *
>>> + * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
>>> + *       will be invalid once the directory was converted into a dx directory
>>> + */
>>> +loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
>>
>> ext4_llseek() worries about max offset for direct/indirect vs. extent-mapped
>> files.  Do we need to worry about the same thing in this function?
> 
> Hrmm, I just checked it and I think either is wrong. We only have to
> care about non-dx directories, so ext4_readdir() applies, which limits
> filp->f_pos < inode->i_size.
> Going to send a patch tomorrow. Thanks for spotting this!

The other thing I'm wondering is whether, in light of

ef3d0fd27e90f67e35da516dafc1482c82939a60 vfs: do (nearly) lockless generic_file_llseek

taking the i_mutex in ext4_dir_llseek could be a perf regression vs what was there before?  Is there anything about the new function which requires stronger locking?

I may be missing something obvious about the nfs interaction, not sure.

-Eric

> Cheers,
> Bernd
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 164c560..cee09f2 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -32,24 +32,8 @@  static unsigned char ext4_filetype_table[] = {
 	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
 };
 
-static int ext4_readdir(struct file *, void *, filldir_t);
 static int ext4_dx_readdir(struct file *filp,
 			   void *dirent, filldir_t filldir);
-static int ext4_release_dir(struct inode *inode,
-				struct file *filp);
-
-const struct file_operations ext4_dir_operations = {
-	.llseek		= ext4_llseek,
-	.read		= generic_read_dir,
-	.readdir	= ext4_readdir,		/* we take BKL. needed?*/
-	.unlocked_ioctl = ext4_ioctl,
-#ifdef CONFIG_COMPAT
-	.compat_ioctl	= ext4_compat_ioctl,
-#endif
-	.fsync		= ext4_sync_file,
-	.release	= ext4_release_dir,
-};
-
 
 static unsigned char get_dtype(struct super_block *sb, int filetype)
 {
@@ -254,22 +238,134 @@  out:
 	return ret;
 }
 
+static inline int is_32bit_api(void)
+{
+#ifdef CONFIG_COMPAT
+	return is_compat_task();
+#else
+	return (BITS_PER_LONG == 32);
+#endif
+}
+
 /*
  * These functions convert from the major/minor hash to an f_pos
- * value.
+ * value for dx directories
  *
- * Currently we only use major hash numer.  This is unfortunate, but
- * on 32-bit machines, the same VFS interface is used for lseek and
- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
- * lseek/telldir/seekdir will blow out spectacularly, and from within
- * the ext2 low-level routine, we don't know if we're being called by
- * a 64-bit version of the system call or the 32-bit version of the
- * system call.  Worse yet, NFSv2 only allows for a 32-bit readdir
- * cookie.  Sigh.
+ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
+ * FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
+ * directly on both 32-bit and 64-bit nodes, under such case, neither
+ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
  */
-#define hash2pos(major, minor)	(major >> 1)
-#define pos2maj_hash(pos)	((pos << 1) & 0xffffffff)
-#define pos2min_hash(pos)	(0)
+static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
+{
+	if ((filp->f_flags & FMODE_32BITHASH) ||
+	    (!(filp->f_flags & FMODE_64BITHASH) && is_32bit_api()))
+		return major >> 1;
+	else
+		return ((__u64)(major >> 1) << 32) | (__u64)minor;
+}
+
+static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
+{
+	if ((filp->f_flags & FMODE_32BITHASH) ||
+	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+		return (pos << 1) & 0xffffffff;
+	else
+		return ((pos >> 32) << 1) & 0xffffffff;
+}
+
+static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
+{
+	if ((filp->f_flags & FMODE_32BITHASH) ||
+	    (!(filp->f_flags & FMODE_64BITHASH) && is_32bit_api()))
+		return 0;
+	else
+		return pos & 0xffffffff;
+}
+
+/*
+ * Return 32- or 64-bit end-of-file for dx directories
+ */
+static inline loff_t ext4_get_htree_eof(struct file *filp)
+{
+	if ((filp->f_mode & FMODE_32BITHASH) ||
+	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+		return EXT4_HTREE_EOF_32BIT;
+	else
+		return EXT4_HTREE_EOF_64BIT;
+}
+
+
+/*
+ * ext4_dir_llseek() based on generic_file_llseek() to handle both
+ * non-htree and htree directories, where the "offset" is in terms
+ * of the filename hash value instead of the byte offset.
+ *
+ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
+ *       will be invalid once the directory was converted into a dx directory
+ */
+loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+	struct inode *inode = file->f_mapping->host;
+	loff_t ret = -EINVAL;
+	int is_dx_dir = ext4_test_inode_flag(inode, EXT4_INODE_INDEX);
+
+	mutex_lock(&inode->i_mutex);
+
+	/* NOTE: relative offsets with dx directories might not work
+	 *       as expected, as it is difficult to figure out the
+	 *       correct offset between dx hashes */
+
+	switch (origin) {
+	case SEEK_END:
+		if (unlikely(offset > 0))
+			goto out_err; /* not supported for directories */
+
+		/* so only negative offsets are left, does that have a
+		 * meaning for directories at all? */
+		if (is_dx_dir)
+			offset += ext4_get_htree_eof(file);
+		else
+			offset += inode->i_size;
+		break;
+	case SEEK_CUR:
+		/*
+		 * Here we special-case the lseek(fd, 0, SEEK_CUR)
+		 * position-querying operation.  Avoid rewriting the "same"
+		 * f_pos value back to the file because a concurrent read(),
+		 * write() or lseek() might have altered it
+		 */
+		if (offset == 0) {
+			offset = file->f_pos;
+			goto out_ok;
+		}
+
+		offset += file->f_pos;
+		break;
+	}
+
+	if (unlikely(offset < 0))
+		goto out_err;
+
+	if (!is_dx_dir) {
+		if (offset > inode->i_sb->s_maxbytes)
+			goto out_err;
+	} else if (offset > ext4_get_htree_eof(file))
+		goto out_err;
+
+	/* Special lock needed here? */
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+
+out_ok:
+	ret = offset;
+out_err:
+	mutex_unlock(&inode->i_mutex);
+
+	return ret;
+}
 
 /*
  * This structure holds the nodes of the red-black tree used to store
@@ -330,15 +426,16 @@  static void free_rb_tree_fname(struct rb_root *root)
 }
 
 
-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
+static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
+							   loff_t pos)
 {
 	struct dir_private_info *p;
 
 	p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
 	if (!p)
 		return NULL;
-	p->curr_hash = pos2maj_hash(pos);
-	p->curr_minor_hash = pos2min_hash(pos);
+	p->curr_hash = pos2maj_hash(filp, pos);
+	p->curr_minor_hash = pos2min_hash(filp, pos);
 	return p;
 }
 
@@ -429,7 +526,7 @@  static int call_filldir(struct file *filp, void *dirent,
 		       "null fname?!?\n");
 		return 0;
 	}
-	curr_pos = hash2pos(fname->hash, fname->minor_hash);
+	curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
 	while (fname) {
 		error = filldir(dirent, fname->name,
 				fname->name_len, curr_pos,
@@ -454,13 +551,13 @@  static int ext4_dx_readdir(struct file *filp,
 	int	ret;
 
 	if (!info) {
-		info = ext4_htree_create_dir_info(filp->f_pos);
+		info = ext4_htree_create_dir_info(filp, filp->f_pos);
 		if (!info)
 			return -ENOMEM;
 		filp->private_data = info;
 	}
 
-	if (filp->f_pos == EXT4_HTREE_EOF)
+	if (filp->f_pos == ext4_get_htree_eof(filp))
 		return 0;	/* EOF */
 
 	/* Some one has messed with f_pos; reset the world */
@@ -468,8 +565,8 @@  static int ext4_dx_readdir(struct file *filp,
 		free_rb_tree_fname(&info->root);
 		info->curr_node = NULL;
 		info->extra_fname = NULL;
-		info->curr_hash = pos2maj_hash(filp->f_pos);
-		info->curr_minor_hash = pos2min_hash(filp->f_pos);
+		info->curr_hash = pos2maj_hash(filp, filp->f_pos);
+		info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
 	}
 
 	/*
@@ -501,7 +598,7 @@  static int ext4_dx_readdir(struct file *filp,
 			if (ret < 0)
 				return ret;
 			if (ret == 0) {
-				filp->f_pos = EXT4_HTREE_EOF;
+				filp->f_pos = ext4_get_htree_eof(filp);
 				break;
 			}
 			info->curr_node = rb_first(&info->root);
@@ -521,7 +618,7 @@  static int ext4_dx_readdir(struct file *filp,
 			info->curr_minor_hash = fname->minor_hash;
 		} else {
 			if (info->next_hash == ~0) {
-				filp->f_pos = EXT4_HTREE_EOF;
+				filp->f_pos = ext4_get_htree_eof(filp);
 				break;
 			}
 			info->curr_hash = info->next_hash;
@@ -540,3 +637,15 @@  static int ext4_release_dir(struct inode *inode, struct file *filp)
 
 	return 0;
 }
+
+const struct file_operations ext4_dir_operations = {
+	.llseek		= ext4_dir_llseek,
+	.read		= generic_read_dir,
+	.readdir	= ext4_readdir,
+	.unlocked_ioctl = ext4_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= ext4_compat_ioctl,
+#endif
+	.fsync		= ext4_sync_file,
+	.release	= ext4_release_dir,
+};
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1554b15..d3fe1ea 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1599,7 +1599,11 @@  struct dx_hash_info
 	u32		*seed;
 };
 
-#define EXT4_HTREE_EOF	0x7fffffff
+
+/* 32 and 64 bit signed EOF for dx directories */
+#define EXT4_HTREE_EOF_32BIT   ((1UL  << (32 - 1)) - 1)
+#define EXT4_HTREE_EOF_64BIT   ((1ULL << (64 - 1)) - 1)
+
 
 /*
  * Control parameters used by ext4_htree_next_block
diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
index ac8f168..fa8e491 100644
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -200,8 +200,8 @@  int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
 		return -1;
 	}
 	hash = hash & ~1;
-	if (hash == (EXT4_HTREE_EOF << 1))
-		hash = (EXT4_HTREE_EOF-1) << 1;
+	if (hash == (EXT4_HTREE_EOF_32BIT << 1))
+		hash = (EXT4_HTREE_EOF_32BIT - 1) << 1;
 	hinfo->hash = hash;
 	hinfo->minor_hash = minor_hash;
 	return 0;