Message ID | 4B02AD8B.2030202@openvz.org |
---|---|
State | Rejected, archived |
Headers | show |
On 2009-11-17, at 06:04, Pavel Emelyanov wrote: > We have a proposal to implement a 2-level disk quota on ext3 and ext4. > > In two words - the aim is to have directories on ext3/4 partitions > which are limited by its disk usage and the number of inodes. Further > the plan is to allow configuring uid and gid quotas within them. > > The main usage of this is containers. When two or more of them are > located on one disk their roots will be marked with a unique tree id > and thus the disk consumption of each container will be limited. While > achieving this goal having an id of what tree an inode belongs to is > a key requirement. How do you handle files with multiple links, if they are located in different trees? The inode would need to have multiple tree ids. You can instead just store this data in an xattr (which will normally be stored in the inode, so no performance impact), and then you are free to store multiple values per inode. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, > We have a proposal to implement a 2-level disk quota on ext3 and ext4. > > In two words - the aim is to have directories on ext3/4 partitions > which are limited by its disk usage and the number of inodes. Further > the plan is to allow configuring uid and gid quotas within them. If I understand it right, this is something like XFS's project quota, right? Note that such thing has implications such as you have to forbid hardlinks between different "quota trees", otherwise it just won't fly... Also by 2-level, you mean it won't be possible to nest such subtrees? I.e. have a quota on directories a/, b/, a/b, a/c? > The main usage of this is containers. When two or more of them are > located on one disk their roots will be marked with a unique tree id > and thus the disk consumption of each container will be limited. While > achieving this goal having an id of what tree an inode belongs to is > a key requirement. > > So first we would like to ask to reserve a place on ext3 and ext4 inodes > for that ID. Do you really need to store tree ID on disk? I'd think that it should be enough to keep some id / pointer in memory and initialize it when we load inode into memory (from an id / pointer of parent directory). Then it would be enough to store a fact that some directory is a root of "quota tree" somewhere - either in extended attributes, as a flag in the inode, or together with quota data. Honza
Jan Kara wrote: > Hi, > >> We have a proposal to implement a 2-level disk quota on ext3 and ext4. >> >> In two words - the aim is to have directories on ext3/4 partitions >> which are limited by its disk usage and the number of inodes. Further >> the plan is to allow configuring uid and gid quotas within them. > If I understand it right, this is something like XFS's project quota, > right? Not exactly. XFS tree quota actually replaces gid one. My proposal is to add the 3rd id. > Note that such thing has implications such as you have to forbid > hardlinks between different "quota trees", otherwise it just won't fly... Yes, I know it. We know other things we'll have to disable, but this is OK to live without them. > Also by 2-level, you mean it won't be possible to nest such subtrees? As I see it - nesting can be done on top of it. I mean - once we have a tree id of an inode and if we say "id A is a sub-id of id B" we're done. As far as containers are concerned - we'll have to map container id to quota tree id, since changing a container id is fast and simple, but it's not so for tree id. That said, this treeid is just a way do distinguish inodes from one sub-tree from the others. > I.e. have a quota on directories a/, b/, a/b, a/c? > >> The main usage of this is containers. When two or more of them are >> located on one disk their roots will be marked with a unique tree id >> and thus the disk consumption of each container will be limited. While >> achieving this goal having an id of what tree an inode belongs to is >> a key requirement. >> >> So first we would like to ask to reserve a place on ext3 and ext4 inodes >> for that ID. > Do you really need to store tree ID on disk? I'd think that it should > be enough to keep some id / pointer in memory and initialize it when we > load inode into memory (from an id / pointer of parent directory). Then > it would be enough to store a fact that some directory is a root of > "quota tree" somewhere - either in extended attributes, as a flag in > the inode, or together with quota data. We can't do it inside ext4_nfs_get_inode unfortunately :( > Honza -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Jan Kara wrote: > > Hi, > > > >> We have a proposal to implement a 2-level disk quota on ext3 and ext4. > >> > >> In two words - the aim is to have directories on ext3/4 partitions > >> which are limited by its disk usage and the number of inodes. Further > >> the plan is to allow configuring uid and gid quotas within them. > > If I understand it right, this is something like XFS's project quota, > > right? > > Not exactly. XFS tree quota actually replaces gid one. My proposal is > to add the 3rd id. Yeah, OK, but it's quite similar :) > > Also by 2-level, you mean it won't be possible to nest such subtrees? > > As I see it - nesting can be done on top of it. I mean - once we have > a tree id of an inode and if we say "id A is a sub-id of id B" we're done. But for implementation, it's kind of important whether there is going to be just one "tree" limitation for each inode, or arbitrary number of them... > > I.e. have a quota on directories a/, b/, a/b, a/c? > > > >> The main usage of this is containers. When two or more of them are > >> located on one disk their roots will be marked with a unique tree id > >> and thus the disk consumption of each container will be limited. While > >> achieving this goal having an id of what tree an inode belongs to is > >> a key requirement. > >> > >> So first we would like to ask to reserve a place on ext3 and ext4 inodes > >> for that ID. > > Do you really need to store tree ID on disk? I'd think that it should > > be enough to keep some id / pointer in memory and initialize it when we > > load inode into memory (from an id / pointer of parent directory). Then > > it would be enough to store a fact that some directory is a root of > > "quota tree" somewhere - either in extended attributes, as a flag in > > the inode, or together with quota data. > We can't do it inside ext4_nfs_get_inode unfortunately :( Right, that's nasty. OK, but as Andreas suggested, extended attributes are more flexible for this - most notably every fs supporting them would be able to support your tree quota extension. Honza
Andreas Dilger <adilger@sun.com> writes: > On 2009-11-17, at 06:04, Pavel Emelyanov wrote: >> We have a proposal to implement a 2-level disk quota on ext3 and ext4. >> >> In two words - the aim is to have directories on ext3/4 partitions >> which are limited by its disk usage and the number of inodes. Further >> the plan is to allow configuring uid and gid quotas within them. >> >> The main usage of this is containers. When two or more of them are >> located on one disk their roots will be marked with a unique tree id >> and thus the disk consumption of each container will be limited. While >> achieving this goal having an id of what tree an inode belongs to is >> a key requirement. > > How do you handle files with multiple links, if they are located in > different trees? The inode would need to have multiple tree ids. A short answer is "NO", inode can not belongs to multiple trees. Containers has some non obvious specific. Each container isolated from another as much as possible. Container has its own root tree. This tree is exported inside CT by numerous possible ways (name-space, virtual-stack-fs, chroot) So container's root are independent tree or several trees. usually they organized like follows /ct_root/CT_${ID}/${tree_content} There are many reasons to keep this trees separate one from another - inode attr: If inode has links in A n B trees. And A-user call chown() for this inode, then B's owner will be surprised. The only way to overcome this is to virtualize inode atributes (for each tree) which is madness IMHO. - checkpoint/restore/online-backup: This is like suspend resume for VM, but in this case only container's process are stopped(freezed) for some time. After CT's process are stopped we may create backup CT's tree without freezing FS as a whole. As I already say there are many way to accomplish this task. But everyone has strong disadvantages: Virtual block devices(qemu-like): problems with consistency and performance ext3/4 + stack-fs(unionfs/vzfs): Bad failure resistance. It is impossible to support jorunalling quota file on stack-fs level. XFS with proj quota : Lack of quota file journalling. XFS itself (please dont balme me, but i'm really not huge XFS fan) So the only way to implement journalled quota for containers is to implement it on native fs level. "Containers directory tree-id" assumptions: (1) Tree id is embedded inside inode (2) Tree id is inherent from parent dir (3) Inode can not belongs to different directory trees Default directory tree (with id == 0) has special meaning. directory which belongs to default tree may contains roots of other trees. Default tree is used for subtree manipulation. ->rename restriction: if (S_ISDIR(old_inode->i_mode)) { if ((new_dir->i_tree_id == 0) || /* move to default tree */ (new_dir->i_tree_id == old_inode->i_tree_id)) /*same tree */ goto good; return -EXDEV; } else { /* If entry have more than one link then it is bad idea to allow rename it to different (even if it's default tree) tree, because this result in rule (3) violation. if (old_inode->i_nlink > 1) && (new_dir->i_tree_id != old_inode->i_tree_id) return -EXDEV; } ->link restriction: /* Links may belongs to only one tree */ if(new_dir->i_tree_id != old_inode->i_tree_id) return -EXDEV; > > You can instead just store this data in an xattr (which will normally > be stored in the inode, so no performance impact), and then you are > free to store multiple values per inode. Yes xattr is possible, but struct ext4_xattr_entry is so big plus space for attr_name ...., But we only want 4 bytes. In fact i've made a proof of concept patch it contains all necessary for tree quota support. I'll post it if you interesting. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Kara <jack@suse.cz> writes: >> Jan Kara wrote: >> > Hi, >> > >> >> We have a proposal to implement a 2-level disk quota on ext3 and ext4. >> >> >> >> In two words - the aim is to have directories on ext3/4 partitions >> >> which are limited by its disk usage and the number of inodes. Further >> >> the plan is to allow configuring uid and gid quotas within them. >> > If I understand it right, this is something like XFS's project quota, >> > right? >> >> Not exactly. XFS tree quota actually replaces gid one. My proposal is >> to add the 3rd id. > Yeah, OK, but it's quite similar :) > >> > Also by 2-level, you mean it won't be possible to nest such subtrees? >> >> As I see it - nesting can be done on top of it. I mean - once we have >> a tree id of an inode and if we say "id A is a sub-id of id B" we're done. > But for implementation, it's kind of important whether there is going > to be just one "tree" limitation for each inode, or arbitrary number of > them... > >> > I.e. have a quota on directories a/, b/, a/b, a/c? >> > I've post fs assumptions to Andreas's replay >> >> The main usage of this is containers. When two or more of them are >> >> located on one disk their roots will be marked with a unique tree id >> >> and thus the disk consumption of each container will be limited. While >> >> achieving this goal having an id of what tree an inode belongs to is >> >> a key requirement. >> >> >> >> So first we would like to ask to reserve a place on ext3 and ext4 inodes >> >> for that ID. >> > Do you really need to store tree ID on disk? I'd think that it should >> > be enough to keep some id / pointer in memory and initialize it when we >> > load inode into memory (from an id / pointer of parent directory). Then >> > it would be enough to store a fact that some directory is a root of >> > "quota tree" somewhere - either in extended attributes, as a flag in >> > the inode, or together with quota data. >> We can't do it inside ext4_nfs_get_inode unfortunately :( Also we will have problems with orphan list cleanup on unclean umount. > Right, that's nasty. OK, but as Andreas suggested, extended attributes > are more flexible for this - most notably every fs supporting them would > be able to support your tree quota extension. > > Honza -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 26d3cf8..0fda97c 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -471,7 +471,7 @@ struct ext4_inode { __le16 l_i_file_acl_high; __le16 l_i_uid_high; /* these 2 fields */ __le16 l_i_gid_high; /* were reserved2[0] */ - __u32 l_i_reserved2; + __u32 l_i_tree_id; /* reserved for 2-level disk quota */ } linux2; struct { __le16 h_i_reserved1; /* Obsoleted fragment number/size which are removed in ext4 */ @@ -585,7 +585,7 @@ do { \ #define i_gid_low i_gid #define i_uid_high osd2.linux2.l_i_uid_high #define i_gid_high osd2.linux2.l_i_gid_high -#define i_reserved2 osd2.linux2.l_i_reserved2 +#define i_tree_id osd2.linux2.l_i_tree_id #elif defined(__GNU__) diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h index 7499b36..d9f633d 100644 --- a/include/linux/ext3_fs.h +++ b/include/linux/ext3_fs.h @@ -320,7 +320,7 @@ struct ext3_inode { __u16 i_pad1; __le16 l_i_uid_high; /* these 2 fields */ __le16 l_i_gid_high; /* were reserved2[0] */ - __u32 l_i_reserved2; + __u32 l_i_tree_id; /* reserved for 2-level disk quota */ } linux2; struct { __u8 h_i_frag; /* Fragment number */ @@ -351,7 +351,7 @@ struct ext3_inode { #define i_gid_low i_gid #define i_uid_high osd2.linux2.l_i_uid_high #define i_gid_high osd2.linux2.l_i_gid_high -#define i_reserved2 osd2.linux2.l_i_reserved2 +#define i_tree_id osd2.linux2.l_i_tree_id #elif defined(__GNU__)
Hi. We have a proposal to implement a 2-level disk quota on ext3 and ext4. In two words - the aim is to have directories on ext3/4 partitions which are limited by its disk usage and the number of inodes. Further the plan is to allow configuring uid and gid quotas within them. The main usage of this is containers. When two or more of them are located on one disk their roots will be marked with a unique tree id and thus the disk consumption of each container will be limited. While achieving this goal having an id of what tree an inode belongs to is a key requirement. So first we would like to ask to reserve a place on ext3 and ext4 inodes for that ID. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> --- -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html