Message ID | 20170708050900.afuwwia7c4izliir@thunk.org |
---|---|
State | New, archived |
Headers | show |
On Sat, Jul 08, 2017 at 01:09:00AM -0400, Theodore Ts'o wrote: > I've applied the following change to this patch in order to better > calculate the credits needed by ext4_new_inode.... So I've been thinking about this some more, and why we can't use extend_transaction because it might break up what needs to be single transaction for ext4_new_inode(). And I think I may have come up with an interesting solution. It adds extra complexity, so it may not be worth it, but I'll throw it out for general consideration. What we could do is have ext4_new_inode check to see if there are enough credits to do add the xattr's (if necessary) in a single commit. If not, what we could do is to add the inode to the orphan list, and then set an inode state flag indicating we have done this. At this point, we *can* break the ext4_new_inode() operation into multiple commits, because if we crash in the middle the inode will be cleaned up when we do the orphan list processing. The downsides of this approach is that it causes the orphan list to be a bottleneck. So we would definitely not want to do this all time. And the extra complexity of tracking whether the inode is on the orphan list might not make it worth it. The other thing that we might want to do is to establish a priority weight system so we can indicate which xattrs should have preference to be stored in the inode (no seeks to access), in the xattr block (requires an extra seek to access), or an ea_inode (requires 2-3 seeks to access). In rough order * Security xattrs -- they are smallest, are needed the most often (you might need them even to see if you are allowed to stat the inode). * Encryption xattr, Acl and Rich Acl's --- needed whenever you access the inode. * In-line data * Everything else Within each priority tier, the smaller xattrs should get preferential treatment since the are more likely to leave space for other xattrs in the storage area. One of the other things we could do is to implement a further optimization where once we decide that an xattr in an xattr block should have its value moved to an ea_inode, , if there is space to move the xattr from the xattr block to the extra space in the inode. This could be done in a separate transaction, or even via a workqueue, since it's an optimization of the on-disk encoding of the inode'x xattr. It's not clear the complexity is worth it (maybe it's something we do in e2fsck first, the way we can optimize directories, and from there we can decide if it's worth implementing in the kernel). - Ted
> What we could do is have ext4_new_inode check to see if there are > enough credits to do add the xattr's (if necessary) in a single > commit. If not, what we could do is to add the inode to the orphan > list, and then set an inode state flag indicating we have done this. > At this point, we *can* break the ext4_new_inode() operation into > multiple commits, because if we crash in the middle the inode will be > cleaned up when we do the orphan list processing. This makes sense. Also, we currently add the worst case credit estimates of individual set xattr ops and start a journal handle with the sum of it. A slight optimization is to do this lazily. We can start with enough credits that can get us to a point where it is safe to start a new transaction (safe because of orphan addition). Then opportunistically extend the credits to get us to the next safe point, if that doesn't work, do the orphan add operation and start a new transaction. This should handle the worst case scenario and also optimize for common case. Also this should in general reduce the amount of allocated-but-unused credits which helps parallelism. > The downsides of this approach is that it causes the orphan list to be > a bottleneck. So we would definitely not want to do this all time. Yes and I think lazy extend/restart should mitigate this.
On Sun, Jul 09, 2017 at 01:01:00PM -0700, Tahsin Erdogan wrote: > > What we could do is have ext4_new_inode check to see if there are > > enough credits to do add the xattr's (if necessary) in a single > > commit. If not, what we could do is to add the inode to the orphan > > list, and then set an inode state flag indicating we have done this. > > At this point, we *can* break the ext4_new_inode() operation into > > multiple commits, because if we crash in the middle the inode will be > > cleaned up when we do the orphan list processing. > > This makes sense. Also, we currently add the worst case credit > estimates of individual set xattr ops and start a journal handle with > the sum of it. A slight optimization is to do this lazily. > We can start with enough credits that can get us to a point where it > is safe to start a new transaction (safe because of orphan addition). I still am very concerned about the code complexity that this approach requires. I am also very concerned about the CPU scalability bottleneck that adding and removing the inode from the orphan list would entail. And if we have to wait for the new commit to start so that we can start a new handle, that's also a CPU scalability bottleneck and is guaranteed to add significant latency. One of the nice things about the xattr priority proposal is that it would guarantee that the security xattrs would never be in an ea_inode. (Since in the inode creation case, the only thing they would be competing with is the acl's, which are lower priority). So this reduces the chances of needing to do a lazy extend/restart in the first place. > > The downsides of this approach is that it causes the orphan list to be > > a bottleneck. So we would definitely not want to do this all time. > > Yes and I think lazy extend/restart should mitigate this. It mitigates it so long as we the lazy extent/restart is never/rarely *used*, since that's when we would incur the orphan list overhead. One other bit about the lazy extend/restart idea is that we need to make sure that there are enough credits left for the callers of ext4_new_inode() before it returns. Otherwise the complexity of this approach would infect all of the users of this interface (since they would have to potentially do the extend/restart of the transaction). - Ted
On Jul 9, 2017, at 14:01, Tahsin Erdogan <tahsin@google.com> wrote: >> What we could do is have ext4_new_inode check to see if there are >> enough credits to do add the xattr's (if necessary) in a single >> commit. If not, what we could do is to add the inode to the orphan >> list, and then set an inode state flag indicating we have done this. >> At this point, we *can* break the ext4_new_inode() operation into >> multiple commits, because if we crash in the middle the inode will be >> cleaned up when we do the orphan list processing. > > This makes sense. Also, we currently add the worst case credit > estimates of individual set xattr ops and start a journal handle with > the sum of it. A slight optimization is to do this lazily. > We can start with enough credits that can get us to a point where it > is safe to start a new transaction (safe because of orphan addition). > Then opportunistically extend the credits to get us to the next safe > point, if that doesn't work, do the orphan add operation and start a > new transaction. This should handle the worst case scenario and also > optimize for common case. Also this should in general reduce the > amount of allocated-but-unused credits which helps parallelism. What about accumulating the total xattr size in the credits calculation? In most cases we know the xattr sizes in advance, and if the transaction handle tracks the total xattr size it can make a good estimate whether the xattrs will fit in the inode or not rather than using worst-case credits all the time. >> The downsides of this approach is that it causes the orphan list to be >> a bottleneck. So we would definitely not want to do this all time. > > Yes and I think lazy extend/restart should mitigate this. Jan had a patch to improve the orphan list performance that never made it into the kernel by having a per-CPU orphan list or similar. It recall it got hung up on running out of reserved inodes or similar, which is an issue we should fix in any case. Cheers, Andreas
On Fri 14-07-17 15:13:27, Andreas Dilger wrote: > >> The downsides of this approach is that it causes the orphan list to be > >> a bottleneck. So we would definitely not want to do this all time. > > > > Yes and I think lazy extend/restart should mitigate this. > > Jan had a patch to improve the orphan list performance that never made it > into the kernel by having a per-CPU orphan list or similar. > > It recall it got hung up on running out of reserved inodes or similar, which > is an issue we should fix in any case. Yeah, and I've submitted patches for e2fsprogs to be able to set number of reserved inodes however they never got merged. I guess I can refresh those patches and resend if there's interest in the functionality and the feature... Honza
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 21a2538afcc2..507bfb3344d4 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -784,12 +784,38 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, } if (!handle && sbi->s_journal && !(i_flags & EXT4_EA_INODE_FL)) { - /* - * 2 ea entries for ext4_init_acl(), 2 for ext4_init_security(). - */ - nblocks += 4 * __ext4_xattr_set_credits(sb, NULL /* inode */, - NULL /* block_bh */, XATTR_SIZE_MAX, +#ifdef CONFIG_EXT4_FS_POSIX_ACL + struct posix_acl *p = get_acl(dir, ACL_TYPE_DEFAULT); + + if (p) { + int acl_size = p->a_count * sizeof(ext4_acl_entry); + + nblocks += (S_ISDIR(mode) ? 2 : 1) * + __ext4_xattr_set_credits(sb, NULL /* inode */, + NULL /* block_bh */, acl_size, + true /* is_create */); + posix_acl_release(p); + } +#endif + +#ifdef CONFIG_SECURITY + { + int num_security_xattrs = 1; + +#ifdef CONFIG_INTEGRITY + num_security_xattrs++; +#endif + /* + * We assume that security xattrs are never + * more than 1k. In practice they are under + * 128 bytes. + */ + nblocks += num_security_xattrs * + __ext4_xattr_set_credits(sb, NULL /* inode */, + NULL /* block_bh */, 1024, true /* is_create */); + } +#endif if (encrypt) nblocks += __ext4_xattr_set_credits(sb, NULL /* inode */, NULL /* block_bh */,