Patchwork [RFC] Adding quotacheck functionality to e2fsck

login
register
mail settings
Submitter Theodore Ts'o
Date March 26, 2010, 12:20 a.m.
Message ID <E1NuxHe-0005cj-JN@closure.thunk.org>
Download mbox | patch
Permalink /patch/48607/
State Superseded
Headers show

Comments

Theodore Ts'o - March 26, 2010, 12:20 a.m.
This is something I whipped up last night to speed up quotacheck by
doing the data collection in e2fsck.  If e2fsck runs and does a full
check, it's likely that quotacheck needs to be run as well --- and it's
faster if e2fsck does the dirty work of fetching the information since
(1) it needs to paw through all of the inodes anyway, and (2) quotacheck
has to go through the file system and iterate over the files in an
non-optimal order.

What do folks think?  Obviously changes in quotacheck would be required
before it could take advantage of these output files, but hopefully that
shouldn't be hard...

To use, either use:

   e2fsck -E usrquota_check=/tmp/quota.user,grpquota_check=/tmp/quota.group

or you can edit /etc/e2fsck.conf and add:

[quota]
	directory = /var/e2fsck/quota

I still need to write documentation, update the man pages, and do some
polishing, so this is still in a pretty rough state, but I'd appreciate
comments.

Thanks,

     	      					- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - March 26, 2010, 12:47 a.m.
Hi Ted,

On Thu 25-03-10 20:20:18, Theodore Ts'o wrote:
> This is something I whipped up last night to speed up quotacheck by
> doing the data collection in e2fsck.  If e2fsck runs and does a full
> check, it's likely that quotacheck needs to be run as well --- and it's
> faster if e2fsck does the dirty work of fetching the information since
> (1) it needs to paw through all of the inodes anyway, and (2) quotacheck
> has to go through the file system and iterate over the files in an
> non-optimal order.
> 
> What do folks think?  Obviously changes in quotacheck would be required
> before it could take advantage of these output files, but hopefully that
> shouldn't be hard...
> 
> To use, either use:
> 
>    e2fsck -E usrquota_check=/tmp/quota.user,grpquota_check=/tmp/quota.group
> 
> or you can edit /etc/e2fsck.conf and add:
> 
> [quota]
> 	directory = /var/e2fsck/quota
> 
> I still need to write documentation, update the man pages, and do some
> polishing, so this is still in a pretty rough state, but I'd appreciate
> comments.
  This is definitely a move in the right direction. I'd be even happier
if e2fsck would write quota file directly - then we could just make
quota files hidden inodes, start doing quota accounting immediately
on mount and always do quota journaling. That would save us quite some
trouble in kernel. The only problem with this is that we'd need to pull
knowledge about quota formats in e2fsck...

								Honza
Theodore Ts'o - March 26, 2010, 3:38 a.m.
On Fri, Mar 26, 2010 at 01:47:38AM +0100, Jan Kara wrote:
>   This is definitely a move in the right direction. I'd be even happier
> if e2fsck would write quota file directly - then we could just make
> quota files hidden inodes, start doing quota accounting immediately
> on mount and always do quota journaling. That would save us quite some
> trouble in kernel. The only problem with this is that we'd need to pull
> knowledge about quota formats in e2fsck...

Yes, quite possibly.  How quota is currently is set up is quite
kludgy, with magic options that do nothing but display magic options
in /proc/mounts, just in case that's a hard link to /etc/mtab.  It
also looks like that some of the magic is in various distribution's
init.d scripts, and so while I very much want to clean things up, it
wasn't clear to me how much flexibility we would have without worrying
about breaking the init scripts for Debian, Ubuntu, RHEL, SLES,
Fedora, Open SuSE, etc.

There may also be other programs that depend on the existence of
aquota.user, and may be reading and writing them in various random
ways, and there is the question of how do we provide compatibility
with these other programs, some of which may not be within quotatools,
but in various magic virtualization or container or cluster management
systems....

So maintaining compatibility between older kernels, newer kernels,
older init scripts, new init scripts, etc. may make changing the quota
system quite difficult.  I would like to do as much cleanup as we can,
though.

One question I have --- do we really have to support the 2 or 3
different quota variants?  How many people/distributions are still
using the original old quota system?  One thing that worries me is
that it looks like the old (non-journaled) quota system may be the
primary system still being used by Canonical and Debian...  I really
do hope I'm wrong, but there are a bunch of HOWTO's that still people
to use usrquota and grpquota in /etc/fstab, and not the newer
usrjquota and grpjquota mount options.

	       	     	   	   	     	 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - March 26, 2010, 7:01 a.m.
On 2010-03-25, at 21:38, tytso@mit.edu wrote:
> On Fri, Mar 26, 2010 at 01:47:38AM +0100, Jan Kara wrote:
>>  This is definitely a move in the right direction. I'd be even  
>> happier
>> if e2fsck would write quota file directly - then we could just make
>> quota files hidden inodes, start doing quota accounting immediately
>> on mount and always do quota journaling. That would save us quite  
>> some
>> trouble in kernel. The only problem with this is that we'd need to  
>> pull
>> knowledge about quota formats in e2fsck...

I totally agree.  Having to run quotacheck when the quota is journaled  
is a major time waster on a large filesystem.  This is doubly true  
since the only time the journal should ever get inconsistent is when  
e2fsck changes it.

> Yes, quite possibly.  How quota is currently is set up is quite
> kludgy, with magic options that do nothing but display magic options
> in /proc/mounts, just in case that's a hard link to /etc/mtab.  It
> also looks like that some of the magic is in various distribution's
> init.d scripts, and so while I very much want to clean things up, it
> wasn't clear to me how much flexibility we would have without worrying
> about breaking the init scripts for Debian, Ubuntu, RHEL, SLES,
> Fedora, Open SuSE, etc.
>
> There may also be other programs that depend on the existence of
> aquota.user, and may be reading and writing them in various random
> ways, and there is the question of how do we provide compatibility
> with these other programs, some of which may not be within quotatools,
> but in various magic virtualization or container or cluster management
> systems....

If the quota file is already present as a regular file, I don't think  
it would be terrible to leave it in place, but to create new quota  
files as hidden files.
It also would be nice to always enable quota journaing in ext4, since  
I don't think this does any harm, and if quotacheck isn't run then at  
least there a good chance the quotas are still correct.

> So maintaining compatibility between older kernels, newer kernels,
> older init scripts, new init scripts, etc. may make changing the quota
> system quite difficult.  I would like to do as much cleanup as we can,
> though.
>
> One question I have --- do we really have to support the 2 or 3
> different quota variants?  How many people/distributions are still
> using the original old quota system?  One thing that worries me is
> that it looks like the old (non-journaled) quota system may be the
> primary system still being used by Canonical and Debian...  I really
> do hope I'm wrong, but there are a bunch of HOWTO's that still people
> to use usrquota and grpquota in /etc/fstab, and not the newer
> usrjquota and grpjquota mount options.

If there isn't a reason to continue using unjournaled quota (i.e. it  
doesn't break to just move to journaled quota everywhere), then these  
could just become aliases for the journaled quota implementation.  The  
other alternative is to deprecate these options in the next kernel and  
have it print out a warning on the console to tell the user to switch  
over to the journaled version.


Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitri Monakho - March 26, 2010, 8:09 a.m.
Jan Kara <jack@suse.cz> writes:

>   Hi Ted,
>
> On Thu 25-03-10 20:20:18, Theodore Ts'o wrote:
>> This is something I whipped up last night to speed up quotacheck by
>> doing the data collection in e2fsck.  If e2fsck runs and does a full
>> check, it's likely that quotacheck needs to be run as well --- and it's
>> faster if e2fsck does the dirty work of fetching the information since
>> (1) it needs to paw through all of the inodes anyway, and (2) quotacheck
>> has to go through the file system and iterate over the files in an
>> non-optimal order.
>> 
>> What do folks think?  Obviously changes in quotacheck would be required
>> before it could take advantage of these output files, but hopefully that
>> shouldn't be hard...
>> 
>> To use, either use:
>> 
>>    e2fsck -E usrquota_check=/tmp/quota.user,grpquota_check=/tmp/quota.group
>> 
>> or you can edit /etc/e2fsck.conf and add:
>> 
>> [quota]
>> 	directory = /var/e2fsck/quota
>> 
>> I still need to write documentation, update the man pages, and do some
>> polishing, so this is still in a pretty rough state, but I'd appreciate
>> comments.
This is definitely right idea.
>   This is definitely a move in the right direction. I'd be even happier
> if e2fsck would write quota file directly - then we could just make
> quota files hidden inodes, start doing quota accounting immediately
Please excuse my naive question, but is it easy enough to allocate
space during fsck?  If we allow to do this then each fsck will result
in sb-changes because of new tmp quota-file creation/rename/deletion
even if sb and quota is ok.
> on mount and always do quota journaling. That would save us quite some
> trouble in kernel. The only problem with this is that we'd need to pull
> knowledge about quota formats in e2fsck...
>
> 								Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitri Monakho - March 26, 2010, 8:18 a.m.
Andreas Dilger <andreas.dilger@oracle.com> writes:

> On 2010-03-25, at 21:38, tytso@mit.edu wrote:
>> On Fri, Mar 26, 2010 at 01:47:38AM +0100, Jan Kara wrote:
>>>  This is definitely a move in the right direction. I'd be even
>>> happier
>>> if e2fsck would write quota file directly - then we could just make
>>> quota files hidden inodes, start doing quota accounting immediately
>>> on mount and always do quota journaling. That would save us quite
>>> some
>>> trouble in kernel. The only problem with this is that we'd need to
>>> pull
>>> knowledge about quota formats in e2fsck...
>
> I totally agree.  Having to run quotacheck when the quota is journaled
> is a major time waster on a large filesystem.  This is doubly true
> since the only time the journal should ever get inconsistent is when
> e2fsck changes it.
>
>> Yes, quite possibly.  How quota is currently is set up is quite
>> kludgy, with magic options that do nothing but display magic options
>> in /proc/mounts, just in case that's a hard link to /etc/mtab.  It
>> also looks like that some of the magic is in various distribution's
>> init.d scripts, and so while I very much want to clean things up, it
>> wasn't clear to me how much flexibility we would have without worrying
>> about breaking the init scripts for Debian, Ubuntu, RHEL, SLES,
>> Fedora, Open SuSE, etc.
>>
>> There may also be other programs that depend on the existence of
>> aquota.user, and may be reading and writing them in various random
>> ways, and there is the question of how do we provide compatibility
>> with these other programs, some of which may not be within quotatools,
>> but in various magic virtualization or container or cluster management
>> systems....
>
> If the quota file is already present as a regular file, I don't think
> it would be terrible to leave it in place, but to create new quota
> files as hidden files.
> It also would be nice to always enable quota journaing in ext4, since
> I don't think this does any harm, and if quotacheck isn't run then at
> least there a good chance the quotas are still correct.
>
>> So maintaining compatibility between older kernels, newer kernels,
>> older init scripts, new init scripts, etc. may make changing the quota
>> system quite difficult.  I would like to do as much cleanup as we can,
>> though.
>>
>> One question I have --- do we really have to support the 2 or 3
>> different quota variants?  How many people/distributions are still
>> using the original old quota system?  One thing that worries me is
>> that it looks like the old (non-journaled) quota system may be the
>> primary system still being used by Canonical and Debian...  I really
>> do hope I'm wrong, but there are a bunch of HOWTO's that still people
>> to use usrquota and grpquota in /etc/fstab, and not the newer
>> usrjquota and grpjquota mount options.
>
> If there isn't a reason to continue using unjournaled quota (i.e. it
> doesn't break to just move to journaled quota everywhere), then these
> could just become aliases for the journaled quota implementation.  The
> other alternative is to deprecate these options in the next kernel and
> have it print out a warning on the console to tell the user to switch
> over to the journaled version.
The only reason to not use journalled quota by default is the currently
it is a bit slower than unjournalled variant.
This is because each quota change result in synchronous quotafile 
update in per-sb-page-cache. And this update is protected by i_mutex.
and dqio_mutex. It may be fixed easily. I've sent a RFC patch two
month ago. I'll update it and will submit it this weekend. 
>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer, Lustre Group
> Oracle Corporation Canada Inc.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - March 26, 2010, 10:42 a.m.
On Thu 25-03-10 23:38:24, tytso@mit.edu wrote:
> On Fri, Mar 26, 2010 at 01:47:38AM +0100, Jan Kara wrote:
> >   This is definitely a move in the right direction. I'd be even happier
> > if e2fsck would write quota file directly - then we could just make
> > quota files hidden inodes, start doing quota accounting immediately
> > on mount and always do quota journaling. That would save us quite some
> > trouble in kernel. The only problem with this is that we'd need to pull
> > knowledge about quota formats in e2fsck...
> 
> Yes, quite possibly.  How quota is currently is set up is quite
> kludgy, with magic options that do nothing but display magic options
> in /proc/mounts, just in case that's a hard link to /etc/mtab.  It
> also looks like that some of the magic is in various distribution's
> init.d scripts, and so while I very much want to clean things up, it
> wasn't clear to me how much flexibility we would have without worrying
> about breaking the init scripts for Debian, Ubuntu, RHEL, SLES,
> Fedora, Open SuSE, etc.
  Well, init scripts can be fixed and if we provide some grace time for
distros to catch up I believe this isn't that hard.

> There may also be other programs that depend on the existence of
> aquota.user, and may be reading and writing them in various random
> ways, and there is the question of how do we provide compatibility
> with these other programs, some of which may not be within quotatools,
> but in various magic virtualization or container or cluster management
> systems....
  Yeah, this is possible, although I'm not aware of any such program -
except for repquota and warnquota from quota-tools but I'll take care about
those. What some programs do is that they change quota files via kernel
(quotactl) or call programs from quota-tools but that is fine (and ultimately
the only way I'd like to leave to userspace when the filesystem is mounted).

> So maintaining compatibility between older kernels, newer kernels,
> older init scripts, new init scripts, etc. may make changing the quota
> system quite difficult.  I would like to do as much cleanup as we can,
> though.
  Actually, XFS and OCFS2 already use hidden quota files. So it won't be
completely new thing.

> One question I have --- do we really have to support the 2 or 3
> different quota variants?  How many people/distributions are still
> using the original old quota system?  One thing that worries me is
> that it looks like the old (non-journaled) quota system may be the
> primary system still being used by Canonical and Debian...  I really
> do hope I'm wrong, but there are a bunch of HOWTO's that still people
> to use usrquota and grpquota in /etc/fstab, and not the newer
> usrjquota and grpjquota mount options.
  Yeah, I believe that support for the oldest quota format can be phased
out - the new format is around for something like 10 years and it had
it's problems at that time already. I guess I'll add a warning to the
next release of quota-tools to the people still using it.
  About quota journaling - it has some performance penalty (changed quota
structures have to be written on every transaction commit instead of just
once on quotaoff time / sync) but I belive that if someone is running
journaled filesystem, he also should use journaled quotas because it's
essentially filesystem metadata.

								Honza
Jan Kara - March 26, 2010, 10:54 a.m.
On Fri 26-03-10 01:01:35, Andreas Dilger wrote:
> >Yes, quite possibly.  How quota is currently is set up is quite
> >kludgy, with magic options that do nothing but display magic options
> >in /proc/mounts, just in case that's a hard link to /etc/mtab.  It
> >also looks like that some of the magic is in various distribution's
> >init.d scripts, and so while I very much want to clean things up, it
> >wasn't clear to me how much flexibility we would have without worrying
> >about breaking the init scripts for Debian, Ubuntu, RHEL, SLES,
> >Fedora, Open SuSE, etc.
> >
> >There may also be other programs that depend on the existence of
> >aquota.user, and may be reading and writing them in various random
> >ways, and there is the question of how do we provide compatibility
> >with these other programs, some of which may not be within quotatools,
> >but in various magic virtualization or container or cluster management
> >systems....
> 
> If the quota file is already present as a regular file, I don't
> think it would be terrible to leave it in place, but to create new
> quota files as hidden files.
> It also would be nice to always enable quota journaing in ext4,
> since I don't think this does any harm, and if quotacheck isn't run
> then at least there a good chance the quotas are still correct.
  Yes, this should be a good option. I imagine we would create RO_COMPAT
features USRQUOTA and GRPQUOTA meaning that the filesystem maintains
quotas in hidden files. And mkfs would directly create these files if
it was asked to.

> >So maintaining compatibility between older kernels, newer kernels,
> >older init scripts, new init scripts, etc. may make changing the quota
> >system quite difficult.  I would like to do as much cleanup as we can,
> >though.
> >
> >One question I have --- do we really have to support the 2 or 3
> >different quota variants?  How many people/distributions are still
> >using the original old quota system?  One thing that worries me is
> >that it looks like the old (non-journaled) quota system may be the
> >primary system still being used by Canonical and Debian...  I really
> >do hope I'm wrong, but there are a bunch of HOWTO's that still people
> >to use usrquota and grpquota in /etc/fstab, and not the newer
> >usrjquota and grpjquota mount options.
> 
> If there isn't a reason to continue using unjournaled quota (i.e. it
> doesn't break to just move to journaled quota everywhere), then
> these could just become aliases for the journaled quota
> implementation.  The other alternative is to deprecate these options
> in the next kernel and have it print out a warning on the console to
> tell the user to switch over to the journaled version.
  If we make quota files hidden and teach quota-tools to not depend on
usr[j]quota options, then we don't need any quota options at all. And I'd
leave usrjquota / grpjquota as they are. Maybe we could issue a warning
when usrquota / grpquota is used but quotacheck already prints the warning
that you should use journaled quotas if it's run on ext3 / ext4. So
we already have this to some extent.

									Honza
Jan Kara - March 26, 2010, 10:57 a.m.
On Fri 26-03-10 11:18:28, Dmitry Monakhov wrote:
> > If there isn't a reason to continue using unjournaled quota (i.e. it
> > doesn't break to just move to journaled quota everywhere), then these
> > could just become aliases for the journaled quota implementation.  The
> > other alternative is to deprecate these options in the next kernel and
> > have it print out a warning on the console to tell the user to switch
> > over to the journaled version.
> The only reason to not use journalled quota by default is the currently
> it is a bit slower than unjournalled variant.
> This is because each quota change result in synchronous quotafile 
> update in per-sb-page-cache. And this update is protected by i_mutex.
> and dqio_mutex. It may be fixed easily. I've sent a RFC patch two
> month ago. I'll update it and will submit it this weekend. 
  Well, there is also some overhead caused by more IO we have to do for
quota journaling and that is essentially unavoidable. But still I believe
we should transition people to journaled quotas...

								Honza
Jan Kara - March 26, 2010, 11 a.m.
On Fri 26-03-10 11:09:59, Dmitry Monakhov wrote:
> Jan Kara <jack@suse.cz> writes:
> > On Thu 25-03-10 20:20:18, Theodore Ts'o wrote:
> >> This is something I whipped up last night to speed up quotacheck by
> >> doing the data collection in e2fsck.  If e2fsck runs and does a full
> >> check, it's likely that quotacheck needs to be run as well --- and it's
> >> faster if e2fsck does the dirty work of fetching the information since
> >> (1) it needs to paw through all of the inodes anyway, and (2) quotacheck
> >> has to go through the file system and iterate over the files in an
> >> non-optimal order.
> >> 
> >> What do folks think?  Obviously changes in quotacheck would be required
> >> before it could take advantage of these output files, but hopefully that
> >> shouldn't be hard...
> >> 
> >> To use, either use:
> >> 
> >>    e2fsck -E usrquota_check=/tmp/quota.user,grpquota_check=/tmp/quota.group
> >> 
> >> or you can edit /etc/e2fsck.conf and add:
> >> 
> >> [quota]
> >> 	directory = /var/e2fsck/quota
> >> 
> >> I still need to write documentation, update the man pages, and do some
> >> polishing, so this is still in a pretty rough state, but I'd appreciate
> >> comments.
> This is definitely right idea.
> >   This is definitely a move in the right direction. I'd be even happier
> > if e2fsck would write quota file directly - then we could just make
> > quota files hidden inodes, start doing quota accounting immediately
> Please excuse my naive question, but is it easy enough to allocate
> space during fsck?  If we allow to do this then each fsck will result
> in sb-changes because of new tmp quota-file creation/rename/deletion
> even if sb and quota is ok.
  Well, how e.g. OCFS2 does this is that if we do full fsck run, we first
load all information from quota file, then do the checking and count usage
and at the end, we write new quota file only if the usage for some user
/ group differs from the one loaded from disk (i.e. fsck changed something).

								Honza
Dmitri Monakho - March 26, 2010, 11:15 a.m.
Jan Kara <jack@suse.cz> writes:

> On Fri 26-03-10 11:18:28, Dmitry Monakhov wrote:
>> > If there isn't a reason to continue using unjournaled quota (i.e. it
>> > doesn't break to just move to journaled quota everywhere), then these
>> > could just become aliases for the journaled quota implementation.  The
>> > other alternative is to deprecate these options in the next kernel and
>> > have it print out a warning on the console to tell the user to switch
>> > over to the journaled version.
>> The only reason to not use journalled quota by default is the currently
>> it is a bit slower than unjournalled variant.
>> This is because each quota change result in synchronous quotafile 
>> update in per-sb-page-cache. And this update is protected by i_mutex.
>> and dqio_mutex. It may be fixed easily. I've sent a RFC patch two
>> month ago. I'll update it and will submit it this weekend. 
>   Well, there is also some overhead caused by more IO we have to do for
> quota journaling and that is essentially unavoidable. But still I believe
> we should transition people to journaled quotas...
Agree. IO overhead due to journalled quota is almost invisible.
And it must be enabled by default after most annoying lock contention
will be resolved.

BTW. i've had bad news. Seems what journalled was broken recently.
Right after i've wrote the first letter. i've started to update the
quota-speedup patch. And during testing phase i've found that
journalled quota is inconsistent after power-failure(w/o my patches).
I've tested ext4.git/for-next branch
Currently i'm investing the issue.
>
> 								Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 26, 2010, 1:38 p.m.
On Fri, Mar 26, 2010 at 11:42:05AM +0100, Jan Kara wrote:
> > There may also be other programs that depend on the existence of
> > aquota.user, and may be reading and writing them in various random
> > ways, and there is the question of how do we provide compatibility
> > with these other programs, some of which may not be within quotatools,
> > but in various magic virtualization or container or cluster management
> > systems....
>   Yeah, this is possible, although I'm not aware of any such program -

Actually, Google's cluster management system is accessing/modifying
aquota.group file directly before and after quota is enabled.  This
may change in the future, but it's one more point of compatibility.

>   Yeah, I believe that support for the oldest quota format can be phased
> out - the new format is around for something like 10 years and it had
> it's problems at that time already. I guess I'll add a warning to the
> next release of quota-tools to the people still using it.

And if we transition to using quotactl calls to access and read the
information in the quota files, then the actual format of the quota
file won't matter any more, right?

Stupid question --- how does repquota work on OCFS2?  I don't see any
quotactl subcommands that would appear to return the functionality
needed by repquota --- unless you just assume that the only uid/gid's
in use are in /etc/passwd and /etc/group, and just call quotactl for
each uid/gid in the system passwd and group files.

     	     	    	   	      	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 26, 2010, 1:51 p.m.
On Fri, Mar 26, 2010 at 11:54:41AM +0100, Jan Kara wrote:
>   Yes, this should be a good option. I imagine we would create RO_COMPAT
> features USRQUOTA and GRPQUOTA meaning that the filesystem maintains
> quotas in hidden files. And mkfs would directly create these files if
> it was asked to.

Technically we don't even need to make this be an RO_COMPAT feature; a
COMPAT feature might suffice.  We just need to have new superblock
fields which indicate the inode numbers for the user and group quotas.
If the inode number is the reserved inode for user or group quotas,
then it's the hidden inode.  If it's the number corresponding to a
user-visible file then we simply haven't transitioned the file over.
See e2fsck to see how we handle automatically transinition a user
visible .journal file to inode #8.  That part's not hard.

I am worried about the transition to a model where quotas are always
enforced; that's quite different from what we had before.  What
happens if someone uses the command quotaoff command?  Does it turn
off quotas?  If the quota files are now hidden, a system administrator
can't use quotacheck (which is an on-line command) to fix bad quotas;
now they have to use e2fsck, which is normally an off-line checker.  I
suppose we could make e2fsck be able to run in an on-line quotacheck
mode, where it only updates quotas and accepts that there may be some
race conditions where the blocks/inodes-in-use numbers won't be
exactly right.

What about use cases where people were accustomed to letting BSD or
MacOS access an ext3 file system, and either accept the quota being
slightly off, or relying on quotacheck to fix tihngs up at some point
later?

These are all things which can be quite surprising to system
administrators...

					- Ted

P.S.  We can add a new superblock field, which is a "quota last
updated time", and if that is less than the superblock write time, it
could be a hint that e2fsck needs to do a quotacheck run.  That could
partially help address the situation of 3rd party OS's/tools accessing
the file system directly.... 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 26, 2010, 1:55 p.m.
On Fri, Mar 26, 2010 at 11:09:59AM +0300, Dmitry Monakhov wrote:
> Please excuse my naive question, but is it easy enough to allocate
> space during fsck?  If we allow to do this then each fsck will result
> in sb-changes because of new tmp quota-file creation/rename/deletion
> even if sb and quota is ok.

We can usually allocate space in fsck --- if there's space available
in the file system, of course!  Of course with the quota file most of
the time we should be able to update the quota file in place, so we
wouldn't need to allocate space most of the time.

	      	 	  	     	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitri Monakho - March 26, 2010, 4:27 p.m.
Dmitry Monakhov <dmonakhov@openvz.org> writes:

> Jan Kara <jack@suse.cz> writes:
>
>> On Fri 26-03-10 11:18:28, Dmitry Monakhov wrote:
>>> > If there isn't a reason to continue using unjournaled quota (i.e. it
>>> > doesn't break to just move to journaled quota everywhere), then these
>>> > could just become aliases for the journaled quota implementation.  The
>>> > other alternative is to deprecate these options in the next kernel and
>>> > have it print out a warning on the console to tell the user to switch
>>> > over to the journaled version.
>>> The only reason to not use journalled quota by default is the currently
>>> it is a bit slower than unjournalled variant.
>>> This is because each quota change result in synchronous quotafile 
>>> update in per-sb-page-cache. And this update is protected by i_mutex.
>>> and dqio_mutex. It may be fixed easily. I've sent a RFC patch two
>>> month ago. I'll update it and will submit it this weekend. 
>>   Well, there is also some overhead caused by more IO we have to do for
>> quota journaling and that is essentially unavoidable. But still I believe
>> we should transition people to journaled quotas...
> Agree. IO overhead due to journalled quota is almost invisible.
> And it must be enabled by default after most annoying lock contention
> will be resolved.
>
> BTW. i've had bad news. Seems what journalled was broken recently.
> Right after i've wrote the first letter. i've started to update the
> quota-speedup patch. And during testing phase i've found that
> journalled quota is inconsistent after power-failure(w/o my patches).
> I've tested ext4.git/for-next branch
> Currently i'm investing the issue.
Ok, i've found the root of issue. dquot_transfer() wasn't called for
symlinks on chown due to lack of ->setattr operation.
Before 'dquot: cleanup dquot transfer routine' patch quota_transfer()
was performed by notify_transfer() itself.
Now it must be handled by in corresponding ->setattr

BTW i'm wondering, even if we don't care about quota. Inode's attributes
are metadata and must goes trough journal(i.e via extXXX_setattr).
so every inode type must has corresponding ->setattr.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - March 29, 2010, 7:35 a.m.
Dmitry Monakhov <dmonakhov@openvz.org> writes:

> Dmitry Monakhov <dmonakhov@openvz.org> writes:
>
>> Jan Kara <jack@suse.cz> writes:
>>
>>> On Fri 26-03-10 11:18:28, Dmitry Monakhov wrote:
>>>> > If there isn't a reason to continue using unjournaled quota (i.e. it
>>>> > doesn't break to just move to journaled quota everywhere), then these
>>>> > could just become aliases for the journaled quota implementation.  The
>>>> > other alternative is to deprecate these options in the next kernel and
>>>> > have it print out a warning on the console to tell the user to switch
>>>> > over to the journaled version.
>>>> The only reason to not use journalled quota by default is the currently
>>>> it is a bit slower than unjournalled variant.
>>>> This is because each quota change result in synchronous quotafile 
>>>> update in per-sb-page-cache. And this update is protected by i_mutex.
>>>> and dqio_mutex. It may be fixed easily. I've sent a RFC patch two
>>>> month ago. I'll update it and will submit it this weekend. 
>>>   Well, there is also some overhead caused by more IO we have to do for
>>> quota journaling and that is essentially unavoidable. But still I believe
>>> we should transition people to journaled quotas...
>> Agree. IO overhead due to journalled quota is almost invisible.
>> And it must be enabled by default after most annoying lock contention
>> will be resolved.
>>
>> BTW. i've had bad news. Seems what journalled was broken recently.
>> Right after i've wrote the first letter. i've started to update the
>> quota-speedup patch. And during testing phase i've found that
>> journalled quota is inconsistent after power-failure(w/o my patches).
>> I've tested ext4.git/for-next branch
>> Currently i'm investing the issue.
> Ok, i've found the root of issue. dquot_transfer() wasn't called for
> symlinks on chown due to lack of ->setattr operation.
> Before 'dquot: cleanup dquot transfer routine' patch quota_transfer()
> was performed by notify_transfer() itself.
Forgot to mention that it is not journalled quota issue. But just a
generic quota regression.
> Now it must be handled by in corresponding ->setattr
>
> BTW i'm wondering, even if we don't care about quota. Inode's attributes
> are metadata and must goes trough journal(i.e via extXXX_setattr).
> so every inode type must has corresponding ->setattr.
As is is always happens. Each modification result in unexpected regressions.
In case of quota cleanup patch-set movement of quota-transfer from
generic-setattr to fs-speciffic ->setattr result in hidden regression
because not all inode types has correct ->setattr methods.
Where are too many filesystems to look-at. Let's add a some
sanity check in to notify_changes(), and remove it after 2/3 moths.

Some thing like this:
static int quota_check(struct inode *inode, struct iattr *attr)
{
        if (!sb_any_quota_active(inode->i_sb))
                return 0;
        if (((attr->ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
                (attr->ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid) ||
                (attr->ia_valid & ATTR_SIZE)) && !inode->i_op->setattr)
                {
                WARN_ON(1);
                return 1;
        }
        return 0;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitri Monakho - March 29, 2010, 7:50 a.m.
Jan Kara <jack@suse.cz> writes:

OOps.. Sorry. previous email was from me(Dmitry Monakhov)
my email scrip goes crazy. Again sorry.
> Dmitry Monakhov <dmonakhov@openvz.org> writes:
>
>> Dmitry Monakhov <dmonakhov@openvz.org> writes:
>>
>>> Jan Kara <jack@suse.cz> writes:
>>>
>>>> On Fri 26-03-10 11:18:28, Dmitry Monakhov wrote:
>>>>> > If there isn't a reason to continue using unjournaled quota (i.e. it
>>>>> > doesn't break to just move to journaled quota everywhere), then these
>>>>> > could just become aliases for the journaled quota implementation.  The
>>>>> > other alternative is to deprecate these options in the next kernel and
>>>>> > have it print out a warning on the console to tell the user to switch
>>>>> > over to the journaled version.
>>>>> The only reason to not use journalled quota by default is the currently
>>>>> it is a bit slower than unjournalled variant.
>>>>> This is because each quota change result in synchronous quotafile 
>>>>> update in per-sb-page-cache. And this update is protected by i_mutex.
>>>>> and dqio_mutex. It may be fixed easily. I've sent a RFC patch two
>>>>> month ago. I'll update it and will submit it this weekend. 
>>>>   Well, there is also some overhead caused by more IO we have to do for
>>>> quota journaling and that is essentially unavoidable. But still I believe
>>>> we should transition people to journaled quotas...
>>> Agree. IO overhead due to journalled quota is almost invisible.
>>> And it must be enabled by default after most annoying lock contention
>>> will be resolved.
>>>
>>> BTW. i've had bad news. Seems what journalled was broken recently.
>>> Right after i've wrote the first letter. i've started to update the
>>> quota-speedup patch. And during testing phase i've found that
>>> journalled quota is inconsistent after power-failure(w/o my patches).
>>> I've tested ext4.git/for-next branch
>>> Currently i'm investing the issue.
>> Ok, i've found the root of issue. dquot_transfer() wasn't called for
>> symlinks on chown due to lack of ->setattr operation.
>> Before 'dquot: cleanup dquot transfer routine' patch quota_transfer()
>> was performed by notify_transfer() itself.
> Forgot to mention that it is not journalled quota issue. But just a
> generic quota regression.
>> Now it must be handled by in corresponding ->setattr
>>
>> BTW i'm wondering, even if we don't care about quota. Inode's attributes
>> are metadata and must goes trough journal(i.e via extXXX_setattr).
>> so every inode type must has corresponding ->setattr.
> As is is always happens. Each modification result in unexpected regressions.
> In case of quota cleanup patch-set movement of quota-transfer from
> generic-setattr to fs-speciffic ->setattr result in hidden regression
> because not all inode types has correct ->setattr methods.
> Where are too many filesystems to look-at. Let's add a some
> sanity check in to notify_changes(), and remove it after 2/3 moths.
>
> Some thing like this:
> static int quota_check(struct inode *inode, struct iattr *attr)
> {
>         if (!sb_any_quota_active(inode->i_sb))
>                 return 0;
>         if (((attr->ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
>                 (attr->ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid) ||
>                 (attr->ia_valid & ATTR_SIZE)) && !inode->i_op->setattr)
>                 {
>                 WARN_ON(1);
>                 return 1;
>         }
>         return 0;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - March 30, 2010, 12:43 a.m.
On Fri 26-03-10 09:51:36, tytso@mit.edu wrote:
> On Fri, Mar 26, 2010 at 11:54:41AM +0100, Jan Kara wrote:
> >   Yes, this should be a good option. I imagine we would create RO_COMPAT
> > features USRQUOTA and GRPQUOTA meaning that the filesystem maintains
> > quotas in hidden files. And mkfs would directly create these files if
> > it was asked to.
> Technically we don't even need to make this be an RO_COMPAT feature; a
> COMPAT feature might suffice.  We just need to have new superblock
> fields which indicate the inode numbers for the user and group quotas.
> If the inode number is the reserved inode for user or group quotas,
> then it's the hidden inode.  If it's the number corresponding to a
> user-visible file then we simply haven't transitioned the file over.
> See e2fsck to see how we handle automatically transinition a user
> visible .journal file to inode #8.  That part's not hard.
  Yes, this should be fine.

> I am worried about the transition to a model where quotas are always
> enforced; that's quite different from what we had before.  What
  I didn't mean quotas would be always enforced. They would be always
accounted (when appropriate quota features are set). They will be enforced
only if admin calls quotaon (and quotaoff turns off only enforcement, not
accounting).

> happens if someone uses the command quotaoff command?  Does it turn
> off quotas?  If the quota files are now hidden, a system administrator
> can't use quotacheck (which is an on-line command) to fix bad quotas;
> now they have to use e2fsck, which is normally an off-line checker.  I
> suppose we could make e2fsck be able to run in an on-line quotacheck
> mode, where it only updates quotas and accepts that there may be some
> race conditions where the blocks/inodes-in-use numbers won't be
> exactly right.
  Well, normally, quota information should never be wrong when we journal
quotas and always account them. So we can treat it as other kinds of
filesystem corruption (although this inconsistency is rather harmless for
data).

> What about use cases where people were accustomed to letting BSD or
> MacOS access an ext3 file system, and either accept the quota being
> slightly off, or relying on quotacheck to fix tihngs up at some point
> later?
  Well, I'm not sure how often people have multi-OS system with quotas.
I expect quotas to be used on multiuser machines where the amount of
trust among users is low - i.e. university servers, hosting servers, ...
Not exactly the case where I would expect the possibility to modify
filesystem externally. So I don't expect this to be common and offline
e2fsck should be fine IMHO. But given it's not too hard to implement
online quotacheck in e2fsck we provide it as well...

> P.S.  We can add a new superblock field, which is a "quota last
> updated time", and if that is less than the superblock write time, it
> could be a hint that e2fsck needs to do a quotacheck run.  That could
> partially help address the situation of 3rd party OS's/tools accessing
> the file system directly.... 
  Yes, I think this will be fine for detecting someone modifying the fs
although having USRQUOTA feature RO_COMPAT would do as well. But I guess
your solution is easier for users.

								Honza
Jan Kara - March 30, 2010, 12:55 a.m.
On Fri 26-03-10 09:38:56, tytso@mit.edu wrote:
> On Fri, Mar 26, 2010 at 11:42:05AM +0100, Jan Kara wrote:
> > > There may also be other programs that depend on the existence of
> > > aquota.user, and may be reading and writing them in various random
> > > ways, and there is the question of how do we provide compatibility
> > > with these other programs, some of which may not be within quotatools,
> > > but in various magic virtualization or container or cluster management
> > > systems....
> >   Yeah, this is possible, although I'm not aware of any such program -
> 
> Actually, Google's cluster management system is accessing/modifying
> aquota.group file directly before and after quota is enabled.  This
> may change in the future, but it's one more point of compatibility.
  I see. Thanks for info.

> >   Yeah, I believe that support for the oldest quota format can be phased
> > out - the new format is around for something like 10 years and it had
> > it's problems at that time already. I guess I'll add a warning to the
> > next release of quota-tools to the people still using it.
> 
> And if we transition to using quotactl calls to access and read the
> information in the quota files, then the actual format of the quota
> file won't matter any more, right?
  Yes, hopefully.

> Stupid question --- how does repquota work on OCFS2?  I don't see any
> quotactl subcommands that would appear to return the functionality
> needed by repquota --- unless you just assume that the only uid/gid's
> in use are in /etc/passwd and /etc/group, and just call quotactl for
> each uid/gid in the system passwd and group files.
  Currently it does not work at all. I didn't get to writing it when
writing original quota support for OCFS2 because the inferface won't be
completely trivial and it would be complicated for OCFS2 to expose the
file directly. Probably the interface will have to be something like
readdir but then you have to have some "handles" and state associated
with them and it gets complicated. Maybe we could make our life simpler
by returning an read-only unseekable fd from repquota quotactl and reading
from it would pass quota structures. But I haven't thought too much about
it.

								Honza
Dmitri Monakho - March 30, 2010, 5:26 a.m.
Jan Kara <jack@suse.cz> writes:

> On Fri 26-03-10 09:38:56, tytso@mit.edu wrote:
>> On Fri, Mar 26, 2010 at 11:42:05AM +0100, Jan Kara wrote:
>> > > There may also be other programs that depend on the existence of
>> > > aquota.user, and may be reading and writing them in various random
>> > > ways, and there is the question of how do we provide compatibility
>> > > with these other programs, some of which may not be within quotatools,
>> > > but in various magic virtualization or container or cluster management
>> > > systems....
>> >   Yeah, this is possible, although I'm not aware of any such program -
>> 
>> Actually, Google's cluster management system is accessing/modifying
>> aquota.group file directly before and after quota is enabled.  This
>> may change in the future, but it's one more point of compatibility.
>   I see. Thanks for info.
>
>> >   Yeah, I believe that support for the oldest quota format can be phased
>> > out - the new format is around for something like 10 years and it had
>> > it's problems at that time already. I guess I'll add a warning to the
>> > next release of quota-tools to the people still using it.
>> 
>> And if we transition to using quotactl calls to access and read the
>> information in the quota files, then the actual format of the quota
>> file won't matter any more, right?
>   Yes, hopefully.
>
>> Stupid question --- how does repquota work on OCFS2?  I don't see any
>> quotactl subcommands that would appear to return the functionality
>> needed by repquota --- unless you just assume that the only uid/gid's
>> in use are in /etc/passwd and /etc/group, and just call quotactl for
>> each uid/gid in the system passwd and group files.
>   Currently it does not work at all. I didn't get to writing it when
> writing original quota support for OCFS2 because the inferface won't be
> completely trivial and it would be complicated for OCFS2 to expose the
> file directly. Probably the interface will have to be something like
> readdir but then you have to have some "handles" and state associated
> with them and it gets complicated. Maybe we could make our life simpler
> by returning an read-only unseekable fd from repquota quotactl and reading
> from it would pass quota structures. But I haven't thought too much about
> it.
Ok. i hope finally we will end up with something like this.
Before introducing this interface it is reasonable to redesign
dquot structures itself because they aren't linked together
so it is not easy to iterate it without probing each id in a loop.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - March 30, 2010, 12:42 p.m.
On Tue 30-03-10 09:26:52, Dmitry Monakhov wrote:
> Jan Kara <jack@suse.cz> writes:
> > On Fri 26-03-10 09:38:56, tytso@mit.edu wrote:
> >> On Fri, Mar 26, 2010 at 11:42:05AM +0100, Jan Kara wrote:
> >> > > There may also be other programs that depend on the existence of
> >> > > aquota.user, and may be reading and writing them in various random
> >> > > ways, and there is the question of how do we provide compatibility
> >> > > with these other programs, some of which may not be within quotatools,
> >> > > but in various magic virtualization or container or cluster management
> >> > > systems....
> >> >   Yeah, this is possible, although I'm not aware of any such program -
> >> 
> >> Actually, Google's cluster management system is accessing/modifying
> >> aquota.group file directly before and after quota is enabled.  This
> >> may change in the future, but it's one more point of compatibility.
> >   I see. Thanks for info.
> >
> >> >   Yeah, I believe that support for the oldest quota format can be phased
> >> > out - the new format is around for something like 10 years and it had
> >> > it's problems at that time already. I guess I'll add a warning to the
> >> > next release of quota-tools to the people still using it.
> >> 
> >> And if we transition to using quotactl calls to access and read the
> >> information in the quota files, then the actual format of the quota
> >> file won't matter any more, right?
> >   Yes, hopefully.
> >
> >> Stupid question --- how does repquota work on OCFS2?  I don't see any
> >> quotactl subcommands that would appear to return the functionality
> >> needed by repquota --- unless you just assume that the only uid/gid's
> >> in use are in /etc/passwd and /etc/group, and just call quotactl for
> >> each uid/gid in the system passwd and group files.
> >   Currently it does not work at all. I didn't get to writing it when
> > writing original quota support for OCFS2 because the inferface won't be
> > completely trivial and it would be complicated for OCFS2 to expose the
> > file directly. Probably the interface will have to be something like
> > readdir but then you have to have some "handles" and state associated
> > with them and it gets complicated. Maybe we could make our life simpler
> > by returning an read-only unseekable fd from repquota quotactl and reading
> > from it would pass quota structures. But I haven't thought too much about
> > it.
> Ok. i hope finally we will end up with something like this.
> Before introducing this interface it is reasonable to redesign
> dquot structures itself because they aren't linked together
> so it is not easy to iterate it without probing each id in a loop.
  Well, the quotactl call would scan the quota file on disk anyway because
all the dquot structures needn't be loaded in memory. So linking structures
in memory will not help.

									Honza

Patch

diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index 8296e72..91b7354 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -63,8 +63,9 @@  COMPILE_ET=$(top_builddir)/lib/et/compile_et --build-tree
 
 OBJS= crc32.o dict.o unix.o e2fsck.o super.o pass1.o pass1b.o pass2.o \
 	pass3.o pass4.o pass5.o journal.o badblocks.o util.o dirinfo.o \
-	dx_dirinfo.o ehandler.o problem.o message.o recovery.o region.o \
-	revoke.o ea_refcount.o rehash.o profile.o prof_err.o $(MTRACE_OBJ)
+	dx_dirinfo.o ehandler.o problem.o message.o quota.o recovery.o \
+	region.o revoke.o ea_refcount.o rehash.o profile.o prof_err.o \
+	$(MTRACE_OBJ)
 
 PROFILED_OBJS= profiled/dict.o profiled/unix.o profiled/e2fsck.o \
 	profiled/super.o profiled/pass1.o profiled/pass1b.o \
@@ -88,6 +89,7 @@  SRCS= $(srcdir)/e2fsck.c \
 	$(srcdir)/pass4.c \
 	$(srcdir)/pass5.c \
 	$(srcdir)/journal.c \
+	$(srcdir)/quota.c \
 	$(srcdir)/recovery.c \
 	$(srcdir)/revoke.c \
 	$(srcdir)/badblocks.c \
diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index 26f7b5e..331656e 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -159,6 +159,8 @@  errcode_t e2fsck_reset_context(e2fsck_t ctx)
 	for (i=0; i < MAX_EXTENT_DEPTH_COUNT; i++)
 		ctx->extent_depth_count[i] = 0;
 
+	quota_data_release(ctx);
+
 	/* Reset the superblock to the user's requested value */
 	ctx->superblock = ctx->use_superblock;
 
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index e763b89..b18b91c 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -61,6 +61,8 @@ 
 #define P_(singular, plural, n) ((n) == 1 ? (singular) : (plural))
 #endif
 
+#include "dict.h"
+
 /*
  * Exit codes used by fsck-type programs
  */
@@ -188,6 +190,13 @@  struct resource_track {
 #define E2F_PASS_1B	6
 
 /*
+ * Quota types
+ */
+#define MAXQUOTAS 2
+#define USRQUOTA  0		/* element used for user quotas */
+#define GRPQUOTA  1		/* element used for group quotas */
+
+/*
  * Define the extended attribute refcount structure
  */
 typedef struct ea_refcount *ext2_refcount_t;
@@ -286,6 +295,12 @@  struct e2fsck_struct {
 	ext2_u32_list	dirs_to_hash;
 
 	/*
+	 * Quota information
+	 */
+	char *quota_fname[MAXQUOTAS];
+	dict_t *quota_dict[MAXQUOTAS];
+
+	/*
 	 * Tuning parameters
 	 */
 	int process_inode_size;
@@ -459,6 +474,18 @@  extern errcode_t e2fsck_adjust_inode_count(e2fsck_t ctx, ext2_ino_t ino,
 					   int adj);
 
 
+/* quota.c */
+extern void default_quota_files_setup(e2fsck_t ctx);
+extern void quota_data_initialize(e2fsck_t ctx);
+extern void quota_data_add(e2fsck_t ctx, struct ext2_inode *inode,
+			   __u64 blocks);
+extern void quota_data_sub(e2fsck_t ctx, struct ext2_inode *inode,
+			   __u64 blocks);
+extern void quota_data_inodes(e2fsck_t ctx, struct ext2_inode *inode,
+			      int adjust);
+extern void quota_data_output(e2fsck_t ctx);
+extern void quota_data_release(e2fsck_t ctx);
+
 /* region.c */
 extern region_t region_create(region_addr_t min, region_addr_t max);
 extern void region_free(region_t region);
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index c39d837..1ffa90d 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -654,13 +654,15 @@  void e2fsck_pass1(e2fsck_t ctx)
 		return;
 	}
 
+	quota_data_initialize(ctx);
+
 	/*
 	 * If the last orphan field is set, clear it, since the pass1
 	 * processing will automatically find and clear the orphans.
 	 * In the future, we may want to try using the last_orphan
 	 * linked list ourselves, but for now, we clear it so that the
 	 * ext3 mount code won't get confused.
-	 */
+ 	 */
 	if (!(ctx->options & E2F_OPT_READONLY)) {
 		if (fs->super->s_last_orphan) {
 			fs->super->s_last_orphan = 0;
@@ -1962,6 +1964,9 @@  static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 		}
 	}
 
+	quota_data_add(ctx, inode, pb.num_blocks * (fs->blocksize / 1024));
+	quota_data_inodes(ctx, inode, +1);
+
 	if (!(fs->super->s_feature_ro_compat &
 	      EXT4_FEATURE_RO_COMPAT_HUGE_FILE) ||
 	    !(inode->i_flags & EXT4_HUGE_FILE_FL))
diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
index 99f0a3c..10885d5 100644
--- a/e2fsck/pass1b.c
+++ b/e2fsck/pass1b.c
@@ -583,6 +583,7 @@  static int delete_file_block(ext2_filsys fs,
 	} else {
 		ext2fs_unmark_block_bitmap(ctx->block_found_map, *block_nr);
 		ext2fs_block_alloc_stats(fs, *block_nr, -1);
+		pb->dup_blocks++;
 	}
 
 	return 0;
@@ -599,7 +600,7 @@  static void delete_file(e2fsck_t ctx, ext2_ino_t ino,
 
 	clear_problem_context(&pctx);
 	pctx.ino = pb.ino = ino;
-	pb.dup_blocks = dp->num_dupblocks;
+	pb.dup_blocks = 0;
 	pb.ctx = ctx;
 	pctx.str = "delete_file";
 
@@ -612,6 +613,8 @@  static void delete_file(e2fsck_t ctx, ext2_ino_t ino,
 	if (ctx->inode_bad_map)
 		ext2fs_unmark_inode_bitmap(ctx->inode_bad_map, ino);
 	ext2fs_inode_alloc_stats2(fs, ino, -1, LINUX_S_ISDIR(inode.i_mode));
+	quota_data_sub(ctx, &inode, pb.dup_blocks * (fs->blocksize / 1024));
+	quota_data_inodes(ctx, &inode, -1);
 
 	/* Inode may have changed by block_iterate, so reread it */
 	e2fsck_read_inode(ctx, ino, &inode, "delete_file");
@@ -637,9 +640,11 @@  static void delete_file(e2fsck_t ctx, ext2_ino_t ino,
 		 */
 		if ((count == 0) ||
 		    ext2fs_test_block_bitmap(ctx->block_dup_map,
-					     inode.i_file_acl))
+					     inode.i_file_acl)) {
 			delete_file_block(fs, &inode.i_file_acl,
 					  BLOCK_COUNT_EXTATTR, 0, 0, &pb);
+			quota_data_sub(ctx, &inode, fs->blocksize / 1024);
+		}
 	}
 }
 
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 761c2f1..da4e21b 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1143,6 +1143,11 @@  abort_free_dict:
 	return DIRENT_ABORT;
 }
 
+struct del_block {
+	e2fsck_t		ctx;
+	e2_blkcnt_t		num;
+};
+
 /*
  * This function is called to deallocate a block, and is an interator
  * functioned called by deallocate inode via ext2fs_iterate_block().
@@ -1154,15 +1159,16 @@  static int deallocate_inode_block(ext2_filsys fs,
 				  int ref_offset EXT2FS_ATTR((unused)),
 				  void *priv_data)
 {
-	e2fsck_t	ctx = (e2fsck_t) priv_data;
+	struct del_block *p = priv_data;
 
 	if (HOLE_BLKADDR(*block_nr))
 		return 0;
 	if ((*block_nr < fs->super->s_first_data_block) ||
 	    (*block_nr >= fs->super->s_blocks_count))
 		return 0;
-	ext2fs_unmark_block_bitmap(ctx->block_found_map, *block_nr);
+	ext2fs_unmark_block_bitmap(p->ctx->block_found_map, *block_nr);
 	ext2fs_block_alloc_stats(fs, *block_nr, -1);
+	p->num++;
 	return 0;
 }
 
@@ -1175,6 +1181,7 @@  static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf)
 	struct ext2_inode	inode;
 	struct problem_context	pctx;
 	__u32			count;
+	struct del_block	del_block;
 
 	e2fsck_read_inode(ctx, ino, &inode, "deallocate_inode");
 	e2fsck_clear_inode(ctx, ino, &inode, 0, "deallocate_inode");
@@ -1216,8 +1223,10 @@  static void deallocate_inode(e2fsck_t ctx, ext2_ino_t ino, char* block_buf)
 	    (inode.i_size_high || inode.i_size & 0x80000000UL))
 		ctx->large_files--;
 
+	del_block.ctx = ctx;
+	del_block.num = 0;
 	pctx.errcode = ext2fs_block_iterate2(fs, ino, 0, block_buf,
-					    deallocate_inode_block, ctx);
+					    deallocate_inode_block, &del_block);
 	if (pctx.errcode) {
 		fix_problem(ctx, PR_2_DEALLOC_INODE, &pctx);
 		ctx->flags |= E2F_FLAG_ABORT;
diff --git a/e2fsck/pass3.c b/e2fsck/pass3.c
index 5a5fd3e..21963a0 100644
--- a/e2fsck/pass3.c
+++ b/e2fsck/pass3.c
@@ -488,6 +488,9 @@  ext2_ino_t e2fsck_get_lost_and_found(e2fsck_t ctx, int fix)
 	ext2fs_icount_store(ctx->inode_count, ino, 2);
 	ext2fs_icount_store(ctx->inode_link_info, ino, 2);
 	ctx->lost_and_found = ino;
+	quota_data_add(ctx, &inode, fs->blocksize / 1024);
+	quota_data_inodes(ctx, &inode, +1);
+
 #if 0
 	printf("/lost+found created; inode #%lu\n", ino);
 #endif
@@ -790,6 +793,7 @@  errcode_t e2fsck_expand_directory(e2fsck_t ctx, ext2_ino_t dir,
 
 	inode.i_size = (es.last_block + 1) * fs->blocksize;
 	ext2fs_iblk_add_blocks(fs, &inode, es.newblocks);
+	quota_data_add(ctx, &inode, num * (fs->blocksize / 1024));
 
 	e2fsck_write_inode(ctx, dir, &inode, "expand_directory");
 
diff --git a/e2fsck/pass4.c b/e2fsck/pass4.c
index d9706ce..0540e63 100644
--- a/e2fsck/pass4.c
+++ b/e2fsck/pass4.c
@@ -63,6 +63,7 @@  static int disconnect_inode(e2fsck_t ctx, ext2_ino_t i,
 			e2fsck_read_bitmaps(ctx);
 			ext2fs_inode_alloc_stats2(fs, i, -1,
 						  LINUX_S_ISDIR(inode->i_mode));
+			quota_data_inodes(ctx, inode, -1);
 			return 0;
 		}
 	}
@@ -183,6 +184,8 @@  void e2fsck_pass4(e2fsck_t ctx)
 	ctx->inode_bb_map = 0;
 	ext2fs_free_inode_bitmap(ctx->inode_imagic_map);
 	ctx->inode_imagic_map = 0;
+	quota_data_output(ctx);
+	quota_data_release(ctx);
 errout:
 	if (buf)
 		ext2fs_free_mem(&buf);
diff --git a/e2fsck/quota.c b/e2fsck/quota.c
new file mode 100644
index 0000000..f25a480
--- /dev/null
+++ b/e2fsck/quota.c
@@ -0,0 +1,275 @@ 
+/*
+ * quota.c --- collect and output quota information 
+ *
+ * Copyright (C) 2010 Theodore Ts'o.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License.
+ * %End-Header%
+ */
+
+#include <errno.h>
+
+#include "e2fsck.h"
+#include "../version.h"
+
+#ifdef HAVE_INTTYPES_H
+#include <inttypes.h>
+#endif
+
+#ifndef HAVE_INTPTR_T
+typedef long intptr_t;
+#endif
+
+/* Needed for architectures where sizeof(int) != sizeof(void *) */
+#define UINT_TO_VOIDPTR(val)  ((void *)(intptr_t)(val))
+#define VOIDPTR_TO_UINT(ptr)  ((unsigned int)(intptr_t)(ptr))
+
+struct quota_el {
+	__u64	blks;
+	__u32	inodes;
+};
+
+static void quota_dnode_free(dnode_t *node,
+			     void *context EXT2FS_ATTR((unused)))
+{
+	void *ptr = node ? dnode_get(node) : 0;
+
+	free(ptr);
+	free(node);
+}
+
+static int dict_uint_cmp(const void *a, const void *b)
+{
+	unsigned int	c, d;
+
+	c = VOIDPTR_TO_UINT(a);
+	d = VOIDPTR_TO_UINT(b);
+
+	return (c-d);
+}
+
+static char *fn_canon(e2fsck_t ctx, char *name)
+{
+	char *cp, *ret;
+
+	cp = name;
+	if (!strncmp(cp, "/dev/", 5))
+		cp += 5;
+	else if (!strncmp(cp, "/device/", 8))
+		cp += 8;
+	ret = string_copy(ctx, cp, 0);
+	if (!ret)
+		return NULL;
+	for (cp = ret; *cp; cp++)
+		if (*cp == '/')
+			*cp = '_';
+	return ret;
+}
+
+void quota_data_files_default(e2fsck_t ctx)
+{
+	char	*cp, *quota_dir, *name_format, *name;
+	int	do_user, do_group, len;
+
+	profile_get_string(ctx->profile, "quota", "directory", 0, 0,
+			   &quota_dir);
+	if (quota_dir == 0)
+		return;
+	profile_get_string(ctx->profile, "quota", "name_format", 0, "name",
+			   &name_format);
+	profile_get_boolean(ctx->profile, "quota", "usrquota", 0, 1,
+			    &do_user);
+	profile_get_boolean(ctx->profile, "quota", "grpquota", 0, 1,
+			    &do_group);
+
+	if (ctx->quota_fname[USRQUOTA] || ctx->quota_fname[GRPQUOTA] ||
+	    (!do_user && !do_group))
+		return;
+
+	if (!strcmp(name_format, "uuid") ||
+	    !strcmp(name_format, "shortuuid")) {
+		char	uuid[37];
+
+		uuid_unparse(ctx->fs->super->s_uuid, uuid);
+		if (name_format[0] == 's')
+			uuid[8] = 0;
+		name = string_copy(ctx, uuid, 0);
+	} else if (!strcmp(name_format, "device")) {
+		name = fn_canon(ctx, ctx->filesystem_name);
+	} else /* if (!strcmp(name_format, "name")) */ {
+		name = fn_canon(ctx, ctx->device_name);
+	}
+	if (!name)
+		fatal_error(ctx, "Couldn't allocate quota file name!");
+
+	len = strlen(quota_dir) + strlen(name) + 32;
+
+	if (do_user) {
+		ctx->quota_fname[USRQUOTA] = 
+			e2fsck_allocate_memory(ctx, len, "quota file name");
+		sprintf(ctx->quota_fname[USRQUOTA], "%s/%s.user",
+			quota_dir, name);
+	}
+
+	if (do_group) {
+		ctx->quota_fname[GRPQUOTA] = 
+			e2fsck_allocate_memory(ctx, len, "quota file name");
+		sprintf(ctx->quota_fname[GRPQUOTA], "%s/%s.group",
+			quota_dir, name);
+	}
+}
+
+/*
+ * Called in Pass #1 to set up the quota tracking data structures
+ */
+void quota_data_initialize(e2fsck_t ctx)
+{
+	int	i;
+	dict_t	*dict;
+
+	for (i=0; i < MAXQUOTAS; i++) {
+		if (ctx->quota_fname[i] == 0)
+			continue;
+
+		dict = (dict_t *) e2fsck_allocate_memory(ctx, sizeof(dict_t),
+							 "quota data dict");
+		ctx->quota_dict[i] = dict;
+		dict_init(dict, DICTCOUNT_T_MAX, dict_uint_cmp);
+		dict_set_allocator(dict, NULL, quota_dnode_free, NULL);
+	}
+	return;
+}
+
+static struct quota_el *get_qp(e2fsck_t ctx, dict_t *dict, __u32 key)
+{
+	struct quota_el	*qp;
+	dnode_t		*n;
+
+	n = dict_lookup(dict, UINT_TO_VOIDPTR(key));
+	if (n)
+		qp = dnode_get(n);
+	else {
+		qp = e2fsck_allocate_memory(ctx,
+			    sizeof(struct quota_el), "quota block count");
+		dict_alloc_insert(dict, UINT_TO_VOIDPTR(key), qp);
+	}
+	return qp;
+}
+
+/*
+ * Called to update the blocks used by a particular inode
+ */
+void quota_data_add(e2fsck_t ctx, struct ext2_inode *inode, __u64 blocks)
+{
+	struct quota_el	*qp;
+	dict_t		*dict;
+
+	if ((dict = ctx->quota_dict[USRQUOTA]) != NULL) {
+		qp = get_qp(ctx, dict, inode_uid(*inode));
+		qp->blks += blocks;
+	}
+	if ((dict = ctx->quota_dict[GRPQUOTA]) != NULL) {
+		qp = get_qp(ctx, dict, inode_gid(*inode));
+		qp->blks += blocks;
+	}
+}
+
+/*
+ * Called to remove some blocks used by a particular inode
+ */
+void quota_data_sub(e2fsck_t ctx, struct ext2_inode *inode, __u64 blocks)
+{
+	struct quota_el	*qp;
+	dict_t		*dict;
+
+	if ((dict = ctx->quota_dict[USRQUOTA]) != NULL) {
+		qp = get_qp(ctx, dict, inode_uid(*inode));
+		qp->blks -= blocks;
+	}
+	if ((dict = ctx->quota_dict[GRPQUOTA]) != NULL) {
+		qp = get_qp(ctx, dict, inode_gid(*inode));
+		qp->blks -= blocks;
+	}
+}
+
+/*
+ * Called to count the files used by an inode's user/group
+ */
+void quota_data_inodes(e2fsck_t ctx, struct ext2_inode *inode, int adjust)
+{
+	struct quota_el	*qp;
+	dict_t		*dict;
+
+	if ((dict = ctx->quota_dict[USRQUOTA]) != NULL) {
+		qp = get_qp(ctx, dict, inode_uid(*inode));
+		qp->inodes += adjust;
+	}
+	if ((dict = ctx->quota_dict[GRPQUOTA]) != NULL) {
+		qp = get_qp(ctx, dict, inode_gid(*inode));
+		qp->inodes += adjust;
+	}
+}
+
+/*
+ * Output the data to ascii files
+ */
+void quota_data_output(e2fsck_t ctx)
+{
+	struct quota_el	*qp;
+	dnode_t		*n;
+	dict_t		*dict;
+	FILE		*f;
+	int		i;
+	__u32		key;
+
+	for (i=0; i < MAXQUOTAS; i++) {
+		dict = ctx->quota_dict[i];
+
+		if (!dict || ctx->quota_fname[i] == 0)
+			continue;
+
+		f = fopen(ctx->quota_fname[i], "w");
+		if (!f) {
+			com_err("quota_data_output", errno,
+				"while trying to open %s",
+				ctx->quota_fname[i]);
+			fatal_error(ctx, 0);
+		}
+		fprintf(f, "# Quota %s file for %s\n#\n",
+			(i == USRQUOTA) ? "user" : "group",
+			ctx->filesystem_name);
+		fprintf(f, "# Generated by e2fsck %s (%s) on %s#\n",
+			E2FSPROGS_VERSION, E2FSPROGS_DATE,
+			asctime(localtime(&ctx->now)));
+		fprintf(f, "# Format: %s-id\tnumblocks\tnumfiles\n#\n",
+			(i == USRQUOTA) ? "user" : "group");
+
+		for (n = dict_first(dict); n; n = dict_next(dict, n)) {
+			key = VOIDPTR_TO_UINT(dnode_getkey(n));
+			qp = dnode_get(n);
+			fprintf(f, "%-9u %-10llu %u\n", key,
+				(unsigned long long) qp->blks, qp->inodes);
+		}
+		fclose(f);
+	}
+}
+
+/*
+ * Release the data structures used to track user/group usage
+ */
+void quota_data_release(e2fsck_t ctx)
+{
+	dict_t	*dict;
+	int	i;
+	__u32	key;
+	__u64	*bp;
+
+	for (i=0; i < MAXQUOTAS; i++) {
+		dict = ctx->quota_dict[i];
+		if (dict)
+			dict_free_nodes(dict);
+		ctx->quota_dict[i] = 0;
+	}
+}
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index fd62ce5..c749ac0 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -586,6 +586,22 @@  static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		} else if (strcmp(token, "fragcheck") == 0) {
 			ctx->options |= E2F_OPT_FRAGCHECK;
 			continue;
+		} else if (strcmp(token, "usrquota_check") == 0) {
+			if (!arg) {
+				extended_usage++;
+				continue;
+			}
+			if (ctx->quota_fname[USRQUOTA])
+				free(ctx->quota_fname[USRQUOTA]);
+			ctx->quota_fname[USRQUOTA] = string_copy(ctx, arg, 0);
+		} else if (strcmp(token, "grpquota_check") == 0) {
+			if (!arg) {
+				extended_usage++;
+				continue;
+			}
+			if (ctx->quota_fname[GRPQUOTA])
+				free(ctx->quota_fname[GRPQUOTA]);
+			ctx->quota_fname[GRPQUOTA] = string_copy(ctx, arg, 0);
 		} else {
 			fprintf(stderr, _("Unknown extended option: %s\n"),
 				token);
@@ -600,6 +616,8 @@  static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		       "is set off by an equals ('=') sign.  "
 		       "Valid extended options are:\n"), stderr);
 		fputs(("\tea_ver=<ea_version (1 or 2)>\n"), stderr);
+		fputs(("\tusrquota_check=<output file name>\n"), stderr);
+		fputs(("\tgrpquota_check=<output file name>\n"), stderr);
 		fputs(("\tfragcheck\n"), stderr);
 		fputc('\n', stderr);
 		exit(1);
@@ -1178,6 +1196,8 @@  failure:
 		if (isspace(*cp) || *cp == ':')
 			*cp = '_';
 
+	quota_data_files_default(ctx);
+
 	ehandler_init(fs->io);
 
 	if ((ctx->mount_flags & EXT2_MF_MOUNTED) &&