Message ID | 2086612.1747054957@warthog.procyon.org.uk |
---|---|
State | New |
Headers | show |
Series | [v2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir | expand |
On Mon, May 12, 2025 at 02:02:37PM +0100, David Howells wrote: > Christian Brauner <brauner@kernel.org> wrote: > > > > Now, in my patch, I added two inode ops because they VFS code involved makes > > > two distinct evaluations and so I made an op for each and, as such, those > > > evaluations may be applicable elsewhere, but I could make a combined op that > > > handles that specific situation instead. > > > > Try to make it one, please. > > Okay, see attached. > > David > ---- > Bash has a work around in redir_open() that causes open(O_CREAT) of a file > in a sticky directory to be retried without O_CREAT if bash was built with > AFS workarounds configured: > > #if defined (AFS) > if ((fd < 0) && (errno == EACCES)) > { > fd = open (filename, flags & ~O_CREAT, mode); > errno = EACCES; /* restore errno */ > } > > #endif /* AFS */ > > This works around the kernel not being able to validly check the > current_fsuid() against i_uid on the file or the directory because the > uidspaces of the system and of AFS may well be disjoint. The problem lies > with the uid checks in may_create_in_sticky(). > > However, the bash work around is going to be removed: > > https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733 > > Fix this in the kernel by providing a ->may_create_in_sticky() inode op, > similar to ->permission(), that, if provided, is called to: > > (1) see if an inode has the same owner as the parent on the path walked; > > (2) determine if the caller owns the file instead of checking the i_uid to > current_fsuid(). > > For kafs, the hook is implemented to see if: > > (1) the AFS owner IDs retrieved on the file and its parent directory by > FS.FetchStatus match; > > (2) if the server set the ADMINISTER bit in the access rights returned by > the FS.FetchStatus and suchlike for the key, indicating ownership by > the user specified by the key. > > (Note that the owner IDs retrieved from an AuriStor YFS server may not fit > in the kuid_t being 64-bit, so they need comparing directly). There's a few other places where we compare vfsuids: * may_delete() -> check_sticky() -> __check_sticky() * may_follow_link() * may_linkat() * fsuidgid_has_mapping() Anyone of those need special treatment on AFS as well? > This can be tested by creating a sticky directory (the user must have a > token to do this) and creating a file in it. Then strace bash doing "echo > foo >>file" and look at whether bash does a single, successful O_CREAT open > on the file or whether that one fails and then bash does one without > O_CREAT that succeeds. > > Reported-by: Etienne Champetier <champetier.etienne@gmail.com> > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Marc Dionne <marc.dionne@auristor.com> > cc: Jeffrey Altman <jaltman@auristor.com> > cc: Chet Ramey <chet.ramey@case.edu> > cc: Alexander Viro <viro@zeniv.linux.org.uk> > cc: Christian Brauner <brauner@kernel.org> > cc: Steve French <sfrench@samba.org> > cc: linux-afs@lists.infradead.org > cc: openafs-devel@openafs.org > cc: linux-cifs@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > --- > fs/afs/dir.c | 1 + > fs/afs/file.c | 1 + > fs/afs/internal.h | 2 ++ > fs/afs/security.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/namei.c | 17 ++++++++++++----- > include/linux/fs.h | 2 ++ > 6 files changed, 70 insertions(+), 5 deletions(-) > > diff --git a/fs/afs/dir.c b/fs/afs/dir.c > index 9e7b1fe82c27..27e565612bde 100644 > --- a/fs/afs/dir.c > +++ b/fs/afs/dir.c > @@ -65,6 +65,7 @@ const struct inode_operations afs_dir_inode_operations = { > .permission = afs_permission, > .getattr = afs_getattr, > .setattr = afs_setattr, > + .may_create_in_sticky = afs_may_create_in_sticky, > }; > > const struct address_space_operations afs_dir_aops = { > diff --git a/fs/afs/file.c b/fs/afs/file.c > index fc15497608c6..dff48d0adec3 100644 > --- a/fs/afs/file.c > +++ b/fs/afs/file.c > @@ -47,6 +47,7 @@ const struct inode_operations afs_file_inode_operations = { > .getattr = afs_getattr, > .setattr = afs_setattr, > .permission = afs_permission, > + .may_create_in_sticky = afs_may_create_in_sticky, > }; > > const struct address_space_operations afs_file_aops = { > diff --git a/fs/afs/internal.h b/fs/afs/internal.h > index 440b0e731093..4a5bb01606a8 100644 > --- a/fs/afs/internal.h > +++ b/fs/afs/internal.h > @@ -1495,6 +1495,8 @@ extern struct key *afs_request_key(struct afs_cell *); > extern struct key *afs_request_key_rcu(struct afs_cell *); > extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *); > extern int afs_permission(struct mnt_idmap *, struct inode *, int); > +int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode, > + struct path *path); > extern void __exit afs_clean_up_permit_cache(void); > > /* > diff --git a/fs/afs/security.c b/fs/afs/security.c > index 6a7744c9e2a2..9fd6e4b5c228 100644 > --- a/fs/afs/security.c > +++ b/fs/afs/security.c > @@ -477,6 +477,58 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode, > return ret; > } > > +/* > + * Perform the ownership checks for a file in a sticky directory on AFS. > + * > + * In the case of AFS, this means that: > + * > + * (1) the file and the directory have the same AFS ownership or > + * > + * (2) the file is owned by the AFS user represented by the token (e.g. from a > + * kerberos server) held in a key. > + * > + * Returns 0 if owned by me or has same owner as parent dir, 1 if not; can also > + * return an error. > + */ > +int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode, > + struct path *path) > +{ > + struct afs_vnode *dvnode, *vnode = AFS_FS_I(inode); > + struct dentry *parent; > + struct key *key; > + afs_access_t access; > + int ret; > + s64 owner; > + > + key = afs_request_key(vnode->volume->cell); > + if (IS_ERR(key)) > + return PTR_ERR(key); > + > + /* Get the owner's ID for the directory. Ideally, we'd use RCU to > + * access the parent rather than getting a ref. > + */ > + parent = dget_parent(path->dentry); > + dvnode = AFS_FS_I(d_backing_inode(parent)); > + owner = dvnode->status.owner; > + dput(parent); > + > + if (vnode->status.owner == owner) { > + ret = 0; > + goto error; > + } > + > + /* Get the access rights for the key on this file. */ > + ret = afs_check_permit(vnode, key, &access); > + if (ret < 0) > + goto error; > + > + /* We get the ADMINISTER bit if we own the file. */ > + ret = (access & AFS_ACE_ADMINISTER) ? 1 : 0; > +error: > + key_put(key); > + return ret; > +} > + > void __exit afs_clean_up_permit_cache(void) > { > int i; > diff --git a/fs/namei.c b/fs/namei.c > index 84a0e0b0111c..e52c91cbed2a 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1316,13 +1316,20 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd, > if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos) > return 0; > > - i_vfsuid = i_uid_into_vfsuid(idmap, inode); > + if (unlikely(inode->i_op->may_create_in_sticky)) { > + int ret = inode->i_op->may_create_in_sticky(idmap, inode, &nd->path); This should probably use an IOP flag just like we do for permission handling. > > - if (vfsuid_eq(i_vfsuid, dir_vfsuid)) > - return 0; > + if (ret <= 0) /* 1 if not owned by me or by parent dir. */ > + return ret; > + } else { > + i_vfsuid = i_uid_into_vfsuid(idmap, inode); > > - if (vfsuid_eq_kuid(i_vfsuid, current_fsuid())) > - return 0; > + if (vfsuid_eq(i_vfsuid, dir_vfsuid)) > + return 0; > + > + if (vfsuid_eq_kuid(i_vfsuid, current_fsuid())) > + return 0; > + } > > if (likely(dir_mode & 0002)) { > audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create"); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 016b0fe1536e..11122e169719 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2236,6 +2236,8 @@ struct inode_operations { > struct dentry *dentry, struct fileattr *fa); > int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); > struct offset_ctx *(*get_offset_ctx)(struct inode *inode); > + int (*may_create_in_sticky)(struct mnt_idmap *idmap, struct inode *inode, > + struct path *path); > } ____cacheline_aligned; > > static inline int call_mmap(struct file *file, struct vm_area_struct *vma) >
Christian Brauner <brauner@kernel.org> wrote: > There's a few other places where we compare vfsuids: > > * may_delete() > -> check_sticky() > -> __check_sticky() > > * may_follow_link() > > * may_linkat() > > * fsuidgid_has_mapping() > > Anyone of those need special treatment on AFS as well? That's a good question. I think it might be better to switch back to the v1 patch - which gives me two separate ops and provide a couple of vfs wrappers for them and use them more widely. So, perhaps: vfs_have_same_owner(inode1, inode2) which indicates if the two inodes have the same ownership and: vfs_is_owned_by_me(inode) which compares the inode's ownership to current_fsuid() by default. The following places need to be considered for being changed: (*) chown_ok() (*) chgrp_ok() Should call vfs_is_owned_by_me(). Possibly these need to defer all their checks to the network filesystem as the interpretation of the target UID/GID depends on the netfs. (*) do_coredump() Should probably call vfs_is_owned_by_me() to check that the file created is owned by the caller - but the check that's there might be sufficient. (*) inode_owner_or_capable() Should call vfs_is_owned_by_me(). I'm not sure whether the namespace mapping makes sense in such a case, but it probably could be used. (*) vfs_setlease() Should call vfs_is_owned_by_me(). Actually, it should query if leasing is permitted. Also, setting locks could perhaps do with a permission call to the filesystem driver as AFS, for example, has a lock permission bit in the ACL, but since the AFS server checks that when the RPC call is made, it's probably unnecessary. (*) acl_permission_check() (*) posix_acl_permission() UIDs are part of these ACLs, so no change required. AFS implements its own ACLs and evaluates them in ->permission() and on the server. (*) may_follow_link() Should call vfs_is_owned_by_me() and also vfs_have_same_owner() on the the link and its parent dir. The latter only applies on world-writable sticky dirs. (*) may_create_in_sticky() The initial subject of this patch. Should call vfs_is_owned_by_me() and also vfs_have_same_owner() both. (*) __check_sticky() Should call vfs_is_owned_by_me() on both the dir and the inode. (*) may_dedupe_file() Should call vfs_is_owned_by_me(). (*) IMA policy ops. No idea. David
I performed a review of the usage of vfsuid_eq_kuid() and vfsuid_eq(). I mostly agree with David's conclusions and add some additional insight into the behavior of AFS servers. On 5/13/2025 4:30 AM, David Howells wrote: > Christian Brauner <brauner@kernel.org> wrote: > >> There's a few other places where we compare vfsuids: >> >> * may_delete() >> -> check_sticky() >> -> __check_sticky() >> >> * may_follow_link() >> >> * may_linkat() >> >> * fsuidgid_has_mapping() >> >> Anyone of those need special treatment on AFS as well? > That's a good question. I think it might be better to switch back to the v1 > patch - which gives me two separate ops and provide a couple of vfs wrappers > for them and use them more widely. > > So, perhaps: > > vfs_have_same_owner(inode1, inode2) > > which indicates if the two inodes have the same ownership and: > > vfs_is_owned_by_me(inode) > > which compares the inode's ownership to current_fsuid() by default. The use of two distinct inode operations make the most sense to me. An alternative is to provide one inode operation which sets two boolean output parameters: int (*check_ownership)(struct inode *const inode, struct inode *const parent,int *is_owned_by_me, int *is_owned_by_parent); where 'is_owned_by_me' or 'is_owned_by_parent' might be NULL if the answer is not required. However, I prefer David's suggestion. > The following places need to be considered for being changed: > > (*) chown_ok() > (*) chgrp_ok() > > Should call vfs_is_owned_by_me(). Possibly these need to defer all their > checks to the network filesystem as the interpretation of the target > UID/GID depends on the netfs. Since the late 1980s, afs servers do not permit changes to owner or group on files unless the caller is a member of the system:administrators group. The file system clients cannot make this determination themselves. If Linux wishes to further restrict the operation to current owner, then use of a vfs_is_owned_by_me() like inode operation should be used. Something to consider for future AFS3 or YFS protocol changes is to report the right to chown|chgrp to the client as part of a the FetchStatus result set. > (*) do_coredump() > > Should probably call vfs_is_owned_by_me() to check that the file created > is owned by the caller - but the check that's there might be sufficient. I agree. > (*) inode_owner_or_capable() > > Should call vfs_is_owned_by_me(). I agree. > I'm not sure whether the namespace > mapping makes sense in such a case, but it probably could be used. > > (*) vfs_setlease() > > Should call vfs_is_owned_by_me(). Actually, it should query if leasing > is permitted. > > Also, setting locks could perhaps do with a permission call to the > filesystem driver as AFS, for example, has a lock permission bit in the > ACL, but since the AFS server checks that when the RPC call is made, it's > probably unnecessary. The AFS server will grant locks based upon the following rules: * the caller is granted the PRSFS_LOCK right (Shared lock only) * the caller is granted the PRSFS_WRITE right (Shared or Exclusive lock) * the caller is the file owner and is granted the PRSFS_INSERT right (Shared or Exclusive lock) The client has enough information to implement a lock permission check if there was such an inode operation. > (*) acl_permission_check() > (*) posix_acl_permission() > > UIDs are part of these ACLs, so no change required. AFS implements its > own ACLs and evaluates them in ->permission() and on the server. acl_permission_check() and posix_acl_permission() will not be called for AFS. However, it it probably worth adding the vfs_is_owned_by_me() to acl_permission_check() in case there is another network filesystem which requires non-uid ownership checks and wants to use generic_permission(). > (*) may_follow_link() > > Should call vfs_is_owned_by_me() and also vfs_have_same_owner() on the > the link and its parent dir. The latter only applies on world-writable > sticky dirs. I agree > (*) may_create_in_sticky() > > The initial subject of this patch. Should call vfs_is_owned_by_me() and > also vfs_have_same_owner() both. I agree. > (*) __check_sticky() > > Should call vfs_is_owned_by_me() on both the dir and the inode. I agree. > (*) may_dedupe_file() > > Should call vfs_is_owned_by_me(). I agree. > > (*) IMA policy ops. > > No idea. I am not familiar with the Integrity Measurement Operations. However, looking at the usage of the ima_rule_entry fowner_op and fgroup_op operations, I do not believe the proposed vfs_is_owned_by_me() could be used to implement fowner_op. If IMA should work filesystems which cannot rely upon local uid comparisons for owner and group, then I think the IMA fowner_op and fgroup_op would require an alternative implementation. At the moment, IMA is unlikely to work properly with AFS. > David Jeffrey Altman
diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 9e7b1fe82c27..27e565612bde 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -65,6 +65,7 @@ const struct inode_operations afs_dir_inode_operations = { .permission = afs_permission, .getattr = afs_getattr, .setattr = afs_setattr, + .may_create_in_sticky = afs_may_create_in_sticky, }; const struct address_space_operations afs_dir_aops = { diff --git a/fs/afs/file.c b/fs/afs/file.c index fc15497608c6..dff48d0adec3 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -47,6 +47,7 @@ const struct inode_operations afs_file_inode_operations = { .getattr = afs_getattr, .setattr = afs_setattr, .permission = afs_permission, + .may_create_in_sticky = afs_may_create_in_sticky, }; const struct address_space_operations afs_file_aops = { diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 440b0e731093..4a5bb01606a8 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1495,6 +1495,8 @@ extern struct key *afs_request_key(struct afs_cell *); extern struct key *afs_request_key_rcu(struct afs_cell *); extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *); extern int afs_permission(struct mnt_idmap *, struct inode *, int); +int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode, + struct path *path); extern void __exit afs_clean_up_permit_cache(void); /* diff --git a/fs/afs/security.c b/fs/afs/security.c index 6a7744c9e2a2..9fd6e4b5c228 100644 --- a/fs/afs/security.c +++ b/fs/afs/security.c @@ -477,6 +477,58 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode, return ret; } +/* + * Perform the ownership checks for a file in a sticky directory on AFS. + * + * In the case of AFS, this means that: + * + * (1) the file and the directory have the same AFS ownership or + * + * (2) the file is owned by the AFS user represented by the token (e.g. from a + * kerberos server) held in a key. + * + * Returns 0 if owned by me or has same owner as parent dir, 1 if not; can also + * return an error. + */ +int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode, + struct path *path) +{ + struct afs_vnode *dvnode, *vnode = AFS_FS_I(inode); + struct dentry *parent; + struct key *key; + afs_access_t access; + int ret; + s64 owner; + + key = afs_request_key(vnode->volume->cell); + if (IS_ERR(key)) + return PTR_ERR(key); + + /* Get the owner's ID for the directory. Ideally, we'd use RCU to + * access the parent rather than getting a ref. + */ + parent = dget_parent(path->dentry); + dvnode = AFS_FS_I(d_backing_inode(parent)); + owner = dvnode->status.owner; + dput(parent); + + if (vnode->status.owner == owner) { + ret = 0; + goto error; + } + + /* Get the access rights for the key on this file. */ + ret = afs_check_permit(vnode, key, &access); + if (ret < 0) + goto error; + + /* We get the ADMINISTER bit if we own the file. */ + ret = (access & AFS_ACE_ADMINISTER) ? 1 : 0; +error: + key_put(key); + return ret; +} + void __exit afs_clean_up_permit_cache(void) { int i; diff --git a/fs/namei.c b/fs/namei.c index 84a0e0b0111c..e52c91cbed2a 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1316,13 +1316,20 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd, if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos) return 0; - i_vfsuid = i_uid_into_vfsuid(idmap, inode); + if (unlikely(inode->i_op->may_create_in_sticky)) { + int ret = inode->i_op->may_create_in_sticky(idmap, inode, &nd->path); - if (vfsuid_eq(i_vfsuid, dir_vfsuid)) - return 0; + if (ret <= 0) /* 1 if not owned by me or by parent dir. */ + return ret; + } else { + i_vfsuid = i_uid_into_vfsuid(idmap, inode); - if (vfsuid_eq_kuid(i_vfsuid, current_fsuid())) - return 0; + if (vfsuid_eq(i_vfsuid, dir_vfsuid)) + return 0; + + if (vfsuid_eq_kuid(i_vfsuid, current_fsuid())) + return 0; + } if (likely(dir_mode & 0002)) { audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create"); diff --git a/include/linux/fs.h b/include/linux/fs.h index 016b0fe1536e..11122e169719 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2236,6 +2236,8 @@ struct inode_operations { struct dentry *dentry, struct fileattr *fa); int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); struct offset_ctx *(*get_offset_ctx)(struct inode *inode); + int (*may_create_in_sticky)(struct mnt_idmap *idmap, struct inode *inode, + struct path *path); } ____cacheline_aligned; static inline int call_mmap(struct file *file, struct vm_area_struct *vma)