Patchwork A request to reserve a "tree id" field on ext[34] inodes

login
register
mail settings
Submitter Pavel Emelyanov
Date Nov. 17, 2009, 2:04 p.m.
Message ID <4B02AD8B.2030202@openvz.org>
Download mbox | patch
Permalink /patch/38634/
State New
Headers show

Comments

Pavel Emelyanov - Nov. 17, 2009, 2:04 p.m.
Hi.

We have a proposal to implement a 2-level disk quota on ext3 and ext4.

In two words - the aim is to have directories on ext3/4 partitions
which are limited by its disk usage and the number of inodes. Further
the plan is to allow configuring uid and gid quotas within them.

The main usage of this is containers. When two or more of them are
located on one disk their roots will be marked with a unique tree id
and thus the disk consumption of each container will be limited. While
achieving this goal having an id of what tree an inode belongs to is
a key requirement. 

So first we would like to ask to reserve a place on ext3 and ext4 inodes
for that ID.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - Nov. 17, 2009, 5:06 p.m.
On 2009-11-17, at 06:04, Pavel Emelyanov wrote:
> We have a proposal to implement a 2-level disk quota on ext3 and ext4.
>
> In two words - the aim is to have directories on ext3/4 partitions
> which are limited by its disk usage and the number of inodes. Further
> the plan is to allow configuring uid and gid quotas within them.
>
> The main usage of this is containers. When two or more of them are
> located on one disk their roots will be marked with a unique tree id
> and thus the disk consumption of each container will be limited. While
> achieving this goal having an id of what tree an inode belongs to is
> a key requirement.

How do you handle files with multiple links, if they are located in  
different trees?  The inode would need to have multiple tree ids.

You can instead just store this data in an xattr (which will normally
be stored in the inode, so no performance impact), and then you are
free to store multiple values per inode.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - Nov. 17, 2009, 5:12 p.m.
Hi,

> We have a proposal to implement a 2-level disk quota on ext3 and ext4.
> 
> In two words - the aim is to have directories on ext3/4 partitions
> which are limited by its disk usage and the number of inodes. Further
> the plan is to allow configuring uid and gid quotas within them.
  If I understand it right, this is something like XFS's project quota,
right? Note that such thing has implications such as you have to forbid
hardlinks between different "quota trees", otherwise it just won't fly...
Also by 2-level, you mean it won't be possible to nest such subtrees?
I.e. have a quota on directories a/, b/, a/b, a/c?

> The main usage of this is containers. When two or more of them are
> located on one disk their roots will be marked with a unique tree id
> and thus the disk consumption of each container will be limited. While
> achieving this goal having an id of what tree an inode belongs to is
> a key requirement.
> 
> So first we would like to ask to reserve a place on ext3 and ext4 inodes
> for that ID.
  Do you really need to store tree ID on disk? I'd think that it should
be enough to keep some id / pointer in memory and initialize it when we
load inode into memory (from an id / pointer of parent directory). Then
it would be enough to store a fact that some directory is a root of
"quota tree" somewhere - either in extended attributes, as a flag in
the inode, or together with quota data.

								Honza
Pavel Emelyanov - Nov. 17, 2009, 5:55 p.m.
Jan Kara wrote:
>   Hi,
> 
>> We have a proposal to implement a 2-level disk quota on ext3 and ext4.
>>
>> In two words - the aim is to have directories on ext3/4 partitions
>> which are limited by its disk usage and the number of inodes. Further
>> the plan is to allow configuring uid and gid quotas within them.
>   If I understand it right, this is something like XFS's project quota,
> right? 

Not exactly. XFS tree quota actually replaces gid one. My proposal is
to add the 3rd id.

> Note that such thing has implications such as you have to forbid
> hardlinks between different "quota trees", otherwise it just won't fly...

Yes, I know it. We know other things we'll have to disable, but this is
OK to live without them.

> Also by 2-level, you mean it won't be possible to nest such subtrees?

As I see it - nesting can be done on top of it. I mean - once we have
a tree id of an inode and if we say "id A is a sub-id of id B" we're done.

As far as containers are concerned - we'll have to map container id to
quota tree id, since changing a container id is fast and simple, but
it's not so for tree id. That said, this treeid is just a way do distinguish
inodes from one sub-tree from the others.

> I.e. have a quota on directories a/, b/, a/b, a/c?
> 
>> The main usage of this is containers. When two or more of them are
>> located on one disk their roots will be marked with a unique tree id
>> and thus the disk consumption of each container will be limited. While
>> achieving this goal having an id of what tree an inode belongs to is
>> a key requirement.
>>
>> So first we would like to ask to reserve a place on ext3 and ext4 inodes
>> for that ID.
>   Do you really need to store tree ID on disk? I'd think that it should
> be enough to keep some id / pointer in memory and initialize it when we
> load inode into memory (from an id / pointer of parent directory). Then
> it would be enough to store a fact that some directory is a root of
> "quota tree" somewhere - either in extended attributes, as a flag in
> the inode, or together with quota data.

We can't do it inside ext4_nfs_get_inode unfortunately :(

> 								Honza

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - Nov. 17, 2009, 6:47 p.m.
> Jan Kara wrote:
> >   Hi,
> > 
> >> We have a proposal to implement a 2-level disk quota on ext3 and ext4.
> >>
> >> In two words - the aim is to have directories on ext3/4 partitions
> >> which are limited by its disk usage and the number of inodes. Further
> >> the plan is to allow configuring uid and gid quotas within them.
> >   If I understand it right, this is something like XFS's project quota,
> > right? 
> 
> Not exactly. XFS tree quota actually replaces gid one. My proposal is
> to add the 3rd id.
  Yeah, OK, but it's quite similar :)

> > Also by 2-level, you mean it won't be possible to nest such subtrees?
> 
> As I see it - nesting can be done on top of it. I mean - once we have
> a tree id of an inode and if we say "id A is a sub-id of id B" we're done.
  But for implementation, it's kind of important whether there is going
to be just one "tree" limitation for each inode, or arbitrary number of
them...

> > I.e. have a quota on directories a/, b/, a/b, a/c?
> > 
> >> The main usage of this is containers. When two or more of them are
> >> located on one disk their roots will be marked with a unique tree id
> >> and thus the disk consumption of each container will be limited. While
> >> achieving this goal having an id of what tree an inode belongs to is
> >> a key requirement.
> >>
> >> So first we would like to ask to reserve a place on ext3 and ext4 inodes
> >> for that ID.
> >   Do you really need to store tree ID on disk? I'd think that it should
> > be enough to keep some id / pointer in memory and initialize it when we
> > load inode into memory (from an id / pointer of parent directory). Then
> > it would be enough to store a fact that some directory is a root of
> > "quota tree" somewhere - either in extended attributes, as a flag in
> > the inode, or together with quota data.
> We can't do it inside ext4_nfs_get_inode unfortunately :(
  Right, that's nasty. OK, but as Andreas suggested, extended attributes
are more flexible for this - most notably every fs supporting them would
be able to support your tree quota extension.

								Honza
Dmitri Monakho - Nov. 17, 2009, 9:19 p.m.
Andreas Dilger <adilger@sun.com> writes:

> On 2009-11-17, at 06:04, Pavel Emelyanov wrote:
>> We have a proposal to implement a 2-level disk quota on ext3 and ext4.
>>
>> In two words - the aim is to have directories on ext3/4 partitions
>> which are limited by its disk usage and the number of inodes. Further
>> the plan is to allow configuring uid and gid quotas within them.
>>
>> The main usage of this is containers. When two or more of them are
>> located on one disk their roots will be marked with a unique tree id
>> and thus the disk consumption of each container will be limited. While
>> achieving this goal having an id of what tree an inode belongs to is
>> a key requirement.
>
> How do you handle files with multiple links, if they are located in
> different trees?  The inode would need to have multiple tree ids.
A short answer is "NO", inode can not belongs to multiple trees.
Containers has some non obvious specific. 
Each container isolated from another as much as possible. 
Container has its own root tree. This tree is exported inside
CT by numerous possible ways (name-space, virtual-stack-fs, chroot)

So container's root are independent tree or several trees.
usually they organized like follows /ct_root/CT_${ID}/${tree_content}
There are many reasons to keep this trees separate one from another
   - inode attr: 
     If inode has links in A n B trees. And A-user call chown() for
     this inode, then B's owner will be surprised.
     The only way to overcome this is to virtualize inode atributes
     (for each tree) which is madness IMHO.
   - checkpoint/restore/online-backup:
     This is like suspend resume for VM, but in this case only
     container's process are stopped(freezed) for some time. After CT's
     process are stopped we may create backup CT's tree without freezing
     FS as a whole.
As I already say there are many way to accomplish this task. But everyone
has strong disadvantages:
Virtual block devices(qemu-like): problems with consistency and performance
ext3/4 + stack-fs(unionfs/vzfs): Bad failure resistance. It is
        impossible to support jorunalling quota file on stack-fs level.
XFS with proj quota : Lack of quota file journalling. XFS itself
        (please dont balme me, but i'm really not huge XFS fan)

So the only way to implement journalled quota for containers is to
implement it on native fs level.

"Containers directory tree-id" assumptions:
(1) Tree id is embedded inside inode
(2) Tree id is inherent from parent dir
(3) Inode can not belongs to different directory trees

Default directory tree (with id == 0) has special meaning.
directory which belongs to default tree may contains roots of
other trees. Default tree is used for subtree manipulation.

->rename restriction:
  if (S_ISDIR(old_inode->i_mode)) {
      if ((new_dir->i_tree_id == 0) || /* move to default tree */
               (new_dir->i_tree_id == old_inode->i_tree_id)) /*same tree */
             goto good;
      return -EXDEV;
  } else {
      /* If entry have more than one link then it is bad idea to allow
         rename it to different (even if it's default tree) tree,
         because this result in rule (3) violation.
      if (old_inode->i_nlink > 1) && 
                    (new_dir->i_tree_id != old_inode->i_tree_id)
            return -EXDEV;
 }
->link restriction: /* Links may  belongs to only one tree */
   if(new_dir->i_tree_id != old_inode->i_tree_id)
            return -EXDEV;

>
> You can instead just store this data in an xattr (which will normally
> be stored in the inode, so no performance impact), and then you are
> free to store multiple values per inode.
Yes xattr is possible, but struct ext4_xattr_entry is so big plus 
space for attr_name ...., But we only want 4 bytes.
In fact i've made a proof of concept patch it contains all necessary
for tree quota support. I'll post it if you interesting.

>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitri Monakho - Nov. 17, 2009, 9:19 p.m.
Jan Kara <jack@suse.cz> writes:

>> Jan Kara wrote:
>> >   Hi,
>> > 
>> >> We have a proposal to implement a 2-level disk quota on ext3 and ext4.
>> >>
>> >> In two words - the aim is to have directories on ext3/4 partitions
>> >> which are limited by its disk usage and the number of inodes. Further
>> >> the plan is to allow configuring uid and gid quotas within them.
>> >   If I understand it right, this is something like XFS's project quota,
>> > right? 
>> 
>> Not exactly. XFS tree quota actually replaces gid one. My proposal is
>> to add the 3rd id.
>   Yeah, OK, but it's quite similar :)
>
>> > Also by 2-level, you mean it won't be possible to nest such subtrees?
>> 
>> As I see it - nesting can be done on top of it. I mean - once we have
>> a tree id of an inode and if we say "id A is a sub-id of id B" we're done.
>   But for implementation, it's kind of important whether there is going
> to be just one "tree" limitation for each inode, or arbitrary number of
> them...
>
>> > I.e. have a quota on directories a/, b/, a/b, a/c?
>> > 
I've post fs assumptions to Andreas's replay
>> >> The main usage of this is containers. When two or more of them are
>> >> located on one disk their roots will be marked with a unique tree id
>> >> and thus the disk consumption of each container will be limited. While
>> >> achieving this goal having an id of what tree an inode belongs to is
>> >> a key requirement.
>> >>
>> >> So first we would like to ask to reserve a place on ext3 and ext4 inodes
>> >> for that ID.
>> >   Do you really need to store tree ID on disk? I'd think that it should
>> > be enough to keep some id / pointer in memory and initialize it when we
>> > load inode into memory (from an id / pointer of parent directory). Then
>> > it would be enough to store a fact that some directory is a root of
>> > "quota tree" somewhere - either in extended attributes, as a flag in
>> > the inode, or together with quota data.
>> We can't do it inside ext4_nfs_get_inode unfortunately :(
Also we will have problems with orphan list cleanup on unclean umount.
>   Right, that's nasty. OK, but as Andreas suggested, extended attributes
> are more flexible for this - most notably every fs supporting them would
> be able to support your tree quota extension.
>
> 								Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 26d3cf8..0fda97c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -471,7 +471,7 @@  struct ext4_inode {
 			__le16	l_i_file_acl_high;
 			__le16	l_i_uid_high;	/* these 2 fields */
 			__le16	l_i_gid_high;	/* were reserved2[0] */
-			__u32	l_i_reserved2;
+			__u32	l_i_tree_id;	/* reserved for 2-level disk quota */
 		} linux2;
 		struct {
 			__le16	h_i_reserved1;	/* Obsoleted fragment number/size which are removed in ext4 */
@@ -585,7 +585,7 @@  do {									       \
 #define i_gid_low	i_gid
 #define i_uid_high	osd2.linux2.l_i_uid_high
 #define i_gid_high	osd2.linux2.l_i_gid_high
-#define i_reserved2	osd2.linux2.l_i_reserved2
+#define i_tree_id	osd2.linux2.l_i_tree_id
 
 #elif defined(__GNU__)
 
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 7499b36..d9f633d 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -320,7 +320,7 @@  struct ext3_inode {
 			__u16	i_pad1;
 			__le16	l_i_uid_high;	/* these 2 fields    */
 			__le16	l_i_gid_high;	/* were reserved2[0] */
-			__u32	l_i_reserved2;
+			__u32	l_i_tree_id;	/* reserved for 2-level disk quota */
 		} linux2;
 		struct {
 			__u8	h_i_frag;	/* Fragment number */
@@ -351,7 +351,7 @@  struct ext3_inode {
 #define i_gid_low	i_gid
 #define i_uid_high	osd2.linux2.l_i_uid_high
 #define i_gid_high	osd2.linux2.l_i_gid_high
-#define i_reserved2	osd2.linux2.l_i_reserved2
+#define i_tree_id	osd2.linux2.l_i_tree_id
 
 #elif defined(__GNU__)