diff mbox series

[v2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir

Message ID 2086612.1747054957@warthog.procyon.org.uk
State New
Headers show
Series [v2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir | expand

Commit Message

David Howells May 12, 2025, 1:02 p.m. UTC
Christian Brauner <brauner@kernel.org> wrote:

> > Now, in my patch, I added two inode ops because they VFS code involved makes
> > two distinct evaluations and so I made an op for each and, as such, those
> > evaluations may be applicable elsewhere, but I could make a combined op that
> > handles that specific situation instead.
> 
> Try to make it one, please.

Okay, see attached.

David
----
Bash has a work around in redir_open() that causes open(O_CREAT) of a file
in a sticky directory to be retried without O_CREAT if bash was built with
AFS workarounds configured:

        #if defined (AFS)
              if ((fd < 0) && (errno == EACCES))
            {
              fd = open (filename, flags & ~O_CREAT, mode);
              errno = EACCES;    /* restore errno */
            }

        #endif /* AFS */

This works around the kernel not being able to validly check the
current_fsuid() against i_uid on the file or the directory because the
uidspaces of the system and of AFS may well be disjoint.  The problem lies
with the uid checks in may_create_in_sticky().

However, the bash work around is going to be removed:

        https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733

Fix this in the kernel by providing a ->may_create_in_sticky() inode op,
similar to ->permission(), that, if provided, is called to:

 (1) see if an inode has the same owner as the parent on the path walked;

 (2) determine if the caller owns the file instead of checking the i_uid to
     current_fsuid().

For kafs, the hook is implemented to see if:

 (1) the AFS owner IDs retrieved on the file and its parent directory by
     FS.FetchStatus match;

 (2) if the server set the ADMINISTER bit in the access rights returned by
     the FS.FetchStatus and suchlike for the key, indicating ownership by
     the user specified by the key.

(Note that the owner IDs retrieved from an AuriStor YFS server may not fit
in the kuid_t being 64-bit, so they need comparing directly).

This can be tested by creating a sticky directory (the user must have a
token to do this) and creating a file in it.  Then strace bash doing "echo
foo >>file" and look at whether bash does a single, successful O_CREAT open
on the file or whether that one fails and then bash does one without
O_CREAT that succeeds.

Reported-by: Etienne Champetier <champetier.etienne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Chet Ramey <chet.ramey@case.edu>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/afs/dir.c       |    1 +
 fs/afs/file.c      |    1 +
 fs/afs/internal.h  |    2 ++
 fs/afs/security.c  |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/namei.c         |   17 ++++++++++++-----
 include/linux/fs.h |    2 ++
 6 files changed, 70 insertions(+), 5 deletions(-)

Comments

Christian Brauner May 13, 2025, 7:49 a.m. UTC | #1
On Mon, May 12, 2025 at 02:02:37PM +0100, David Howells wrote:
> Christian Brauner <brauner@kernel.org> wrote:
> 
> > > Now, in my patch, I added two inode ops because they VFS code involved makes
> > > two distinct evaluations and so I made an op for each and, as such, those
> > > evaluations may be applicable elsewhere, but I could make a combined op that
> > > handles that specific situation instead.
> > 
> > Try to make it one, please.
> 
> Okay, see attached.
> 
> David
> ----
> Bash has a work around in redir_open() that causes open(O_CREAT) of a file
> in a sticky directory to be retried without O_CREAT if bash was built with
> AFS workarounds configured:
> 
>         #if defined (AFS)
>               if ((fd < 0) && (errno == EACCES))
>             {
>               fd = open (filename, flags & ~O_CREAT, mode);
>               errno = EACCES;    /* restore errno */
>             }
> 
>         #endif /* AFS */
> 
> This works around the kernel not being able to validly check the
> current_fsuid() against i_uid on the file or the directory because the
> uidspaces of the system and of AFS may well be disjoint.  The problem lies
> with the uid checks in may_create_in_sticky().
> 
> However, the bash work around is going to be removed:
> 
>         https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733
> 
> Fix this in the kernel by providing a ->may_create_in_sticky() inode op,
> similar to ->permission(), that, if provided, is called to:
> 
>  (1) see if an inode has the same owner as the parent on the path walked;
> 
>  (2) determine if the caller owns the file instead of checking the i_uid to
>      current_fsuid().
> 
> For kafs, the hook is implemented to see if:
> 
>  (1) the AFS owner IDs retrieved on the file and its parent directory by
>      FS.FetchStatus match;
> 
>  (2) if the server set the ADMINISTER bit in the access rights returned by
>      the FS.FetchStatus and suchlike for the key, indicating ownership by
>      the user specified by the key.
> 
> (Note that the owner IDs retrieved from an AuriStor YFS server may not fit
> in the kuid_t being 64-bit, so they need comparing directly).

There's a few other places where we compare vfsuids:

* may_delete()
  -> check_sticky()
     -> __check_sticky()

* may_follow_link()

* may_linkat()

* fsuidgid_has_mapping()

Anyone of those need special treatment on AFS as well?

> This can be tested by creating a sticky directory (the user must have a
> token to do this) and creating a file in it.  Then strace bash doing "echo
> foo >>file" and look at whether bash does a single, successful O_CREAT open
> on the file or whether that one fails and then bash does one without
> O_CREAT that succeeds.
> 
> Reported-by: Etienne Champetier <champetier.etienne@gmail.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Marc Dionne <marc.dionne@auristor.com>
> cc: Jeffrey Altman <jaltman@auristor.com>
> cc: Chet Ramey <chet.ramey@case.edu>
> cc: Alexander Viro <viro@zeniv.linux.org.uk>
> cc: Christian Brauner <brauner@kernel.org>
> cc: Steve French <sfrench@samba.org>
> cc: linux-afs@lists.infradead.org
> cc: openafs-devel@openafs.org
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/afs/dir.c       |    1 +
>  fs/afs/file.c      |    1 +
>  fs/afs/internal.h  |    2 ++
>  fs/afs/security.c  |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/namei.c         |   17 ++++++++++++-----
>  include/linux/fs.h |    2 ++
>  6 files changed, 70 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index 9e7b1fe82c27..27e565612bde 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -65,6 +65,7 @@ const struct inode_operations afs_dir_inode_operations = {
>  	.permission	= afs_permission,
>  	.getattr	= afs_getattr,
>  	.setattr	= afs_setattr,
> +	.may_create_in_sticky = afs_may_create_in_sticky,
>  };
>  
>  const struct address_space_operations afs_dir_aops = {
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index fc15497608c6..dff48d0adec3 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -47,6 +47,7 @@ const struct inode_operations afs_file_inode_operations = {
>  	.getattr	= afs_getattr,
>  	.setattr	= afs_setattr,
>  	.permission	= afs_permission,
> +	.may_create_in_sticky = afs_may_create_in_sticky,
>  };
>  
>  const struct address_space_operations afs_file_aops = {
> diff --git a/fs/afs/internal.h b/fs/afs/internal.h
> index 440b0e731093..4a5bb01606a8 100644
> --- a/fs/afs/internal.h
> +++ b/fs/afs/internal.h
> @@ -1495,6 +1495,8 @@ extern struct key *afs_request_key(struct afs_cell *);
>  extern struct key *afs_request_key_rcu(struct afs_cell *);
>  extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
>  extern int afs_permission(struct mnt_idmap *, struct inode *, int);
> +int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode,
> +			     struct path *path);
>  extern void __exit afs_clean_up_permit_cache(void);
>  
>  /*
> diff --git a/fs/afs/security.c b/fs/afs/security.c
> index 6a7744c9e2a2..9fd6e4b5c228 100644
> --- a/fs/afs/security.c
> +++ b/fs/afs/security.c
> @@ -477,6 +477,58 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
>  	return ret;
>  }
>  
> +/*
> + * Perform the ownership checks for a file in a sticky directory on AFS.
> + *
> + * In the case of AFS, this means that:
> + *
> + * (1) the file and the directory have the same AFS ownership or
> + *
> + * (2) the file is owned by the AFS user represented by the token (e.g. from a
> + *     kerberos server) held in a key.
> + *
> + * Returns 0 if owned by me or has same owner as parent dir, 1 if not; can also
> + * return an error.
> + */
> +int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode,
> +			     struct path *path)
> +{
> +	struct afs_vnode *dvnode, *vnode = AFS_FS_I(inode);
> +	struct dentry *parent;
> +	struct key *key;
> +	afs_access_t access;
> +	int ret;
> +	s64 owner;
> +
> +	key = afs_request_key(vnode->volume->cell);
> +	if (IS_ERR(key))
> +		return PTR_ERR(key);
> +
> +	/* Get the owner's ID for the directory.  Ideally, we'd use RCU to
> +	 * access the parent rather than getting a ref.
> +	 */
> +	parent = dget_parent(path->dentry);
> +	dvnode = AFS_FS_I(d_backing_inode(parent));
> +	owner = dvnode->status.owner;
> +	dput(parent);
> +
> +	if (vnode->status.owner == owner) {
> +		ret = 0;
> +		goto error;
> +	}
> +
> +	/* Get the access rights for the key on this file. */
> +	ret = afs_check_permit(vnode, key, &access);
> +	if (ret < 0)
> +		goto error;
> +
> +	/* We get the ADMINISTER bit if we own the file. */
> +	ret = (access & AFS_ACE_ADMINISTER) ? 1 : 0;
> +error:
> +	key_put(key);
> +	return ret;
> +}
> +
>  void __exit afs_clean_up_permit_cache(void)
>  {
>  	int i;
> diff --git a/fs/namei.c b/fs/namei.c
> index 84a0e0b0111c..e52c91cbed2a 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1316,13 +1316,20 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
>  	if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
>  		return 0;
>  
> -	i_vfsuid = i_uid_into_vfsuid(idmap, inode);
> +	if (unlikely(inode->i_op->may_create_in_sticky)) {
> +		int ret = inode->i_op->may_create_in_sticky(idmap, inode, &nd->path);

This should probably use an IOP flag just like we do for permission
handling.

>  
> -	if (vfsuid_eq(i_vfsuid, dir_vfsuid))
> -		return 0;
> +		if (ret <= 0) /* 1 if not owned by me or by parent dir. */
> +			return ret;
> +	} else {
> +		i_vfsuid = i_uid_into_vfsuid(idmap, inode);
>  
> -	if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
> -		return 0;
> +		if (vfsuid_eq(i_vfsuid, dir_vfsuid))
> +			return 0;
> +
> +		if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
> +			return 0;
> +	}
>  
>  	if (likely(dir_mode & 0002)) {
>  		audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 016b0fe1536e..11122e169719 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2236,6 +2236,8 @@ struct inode_operations {
>  			    struct dentry *dentry, struct fileattr *fa);
>  	int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
>  	struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
> +	int (*may_create_in_sticky)(struct mnt_idmap *idmap, struct inode *inode,
> +				    struct path *path);
>  } ____cacheline_aligned;
>  
>  static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
>
David Howells May 13, 2025, 8:30 a.m. UTC | #2
Christian Brauner <brauner@kernel.org> wrote:

> There's a few other places where we compare vfsuids:
> 
> * may_delete()
>   -> check_sticky()
>      -> __check_sticky()
> 
> * may_follow_link()
> 
> * may_linkat()
> 
> * fsuidgid_has_mapping()
> 
> Anyone of those need special treatment on AFS as well?

That's a good question.  I think it might be better to switch back to the v1
patch - which gives me two separate ops and provide a couple of vfs wrappers
for them and use them more widely.

So, perhaps:

	vfs_have_same_owner(inode1, inode2)

which indicates if the two inodes have the same ownership and:

	vfs_is_owned_by_me(inode)

which compares the inode's ownership to current_fsuid() by default.

The following places need to be considered for being changed:

 (*) chown_ok()
 (*) chgrp_ok()

     Should call vfs_is_owned_by_me().  Possibly these need to defer all their
     checks to the network filesystem as the interpretation of the target
     UID/GID depends on the netfs.

 (*) do_coredump()

     Should probably call vfs_is_owned_by_me() to check that the file created
     is owned by the caller - but the check that's there might be sufficient.

 (*) inode_owner_or_capable()

     Should call vfs_is_owned_by_me().  I'm not sure whether the namespace
     mapping makes sense in such a case, but it probably could be used.

 (*) vfs_setlease()

     Should call vfs_is_owned_by_me().  Actually, it should query if leasing
     is permitted.

     Also, setting locks could perhaps do with a permission call to the
     filesystem driver as AFS, for example, has a lock permission bit in the
     ACL, but since the AFS server checks that when the RPC call is made, it's
     probably unnecessary.

 (*) acl_permission_check()
 (*) posix_acl_permission()

     UIDs are part of these ACLs, so no change required.  AFS implements its
     own ACLs and evaluates them in ->permission() and on the server.

 (*) may_follow_link()

     Should call vfs_is_owned_by_me() and also vfs_have_same_owner() on the
     the link and its parent dir.  The latter only applies on world-writable
     sticky dirs.

 (*) may_create_in_sticky()

     The initial subject of this patch.  Should call vfs_is_owned_by_me() and
     also vfs_have_same_owner() both.

 (*) __check_sticky()

     Should call vfs_is_owned_by_me() on both the dir and the inode.

 (*) may_dedupe_file()

     Should call vfs_is_owned_by_me().

 (*) IMA policy ops.

     No idea.

David
Jeffrey E Altman May 13, 2025, 3:44 p.m. UTC | #3
I performed a review of the usage of vfsuid_eq_kuid() and vfsuid_eq().
I mostly agree with David's conclusions and add some additional insight 
into the behavior of AFS servers.

On 5/13/2025 4:30 AM, David Howells wrote:
> Christian Brauner <brauner@kernel.org> wrote:
>
>> There's a few other places where we compare vfsuids:
>>
>> * may_delete()
>>    -> check_sticky()
>>       -> __check_sticky()
>>
>> * may_follow_link()
>>
>> * may_linkat()
>>
>> * fsuidgid_has_mapping()
>>
>> Anyone of those need special treatment on AFS as well?
> That's a good question.  I think it might be better to switch back to the v1
> patch - which gives me two separate ops and provide a couple of vfs wrappers
> for them and use them more widely.
>
> So, perhaps:
>
> 	vfs_have_same_owner(inode1, inode2)
>
> which indicates if the two inodes have the same ownership and:
>
> 	vfs_is_owned_by_me(inode)
>
> which compares the inode's ownership to current_fsuid() by default.

The use of two distinct inode operations make the most sense to me.  An 
alternative is to provide one inode operation which sets two boolean 
output parameters:

int (*check_ownership)(struct inode *const inode, struct inode *const 
parent,int *is_owned_by_me, int *is_owned_by_parent); where 
'is_owned_by_me' or 'is_owned_by_parent' might be NULL if the answer is 
not required. However, I prefer David's suggestion.

> The following places need to be considered for being changed:
>
>   (*) chown_ok()
>   (*) chgrp_ok()
>
>       Should call vfs_is_owned_by_me().  Possibly these need to defer all their
>       checks to the network filesystem as the interpretation of the target
>       UID/GID depends on the netfs.

Since the late 1980s, afs servers do not permit changes to owner or 
group on files unless the caller is a member of the 
system:administrators group.  The file system clients cannot make this 
determination themselves.  If Linux wishes to further restrict the 
operation to current owner, then use of a vfs_is_owned_by_me() like 
inode operation should be used.

Something to consider for future AFS3 or YFS protocol changes is to 
report the right to chown|chgrp to the client as part of a the 
FetchStatus result set.

>   (*) do_coredump()
>
>       Should probably call vfs_is_owned_by_me() to check that the file created
>       is owned by the caller - but the check that's there might be sufficient.
I agree.
>   (*) inode_owner_or_capable()
>
>       Should call vfs_is_owned_by_me().
I agree.
>       I'm not sure whether the namespace
>       mapping makes sense in such a case, but it probably could be used.
>
>   (*) vfs_setlease()
>
>       Should call vfs_is_owned_by_me().  Actually, it should query if leasing
>       is permitted.
>
>       Also, setting locks could perhaps do with a permission call to the
>       filesystem driver as AFS, for example, has a lock permission bit in the
>       ACL, but since the AFS server checks that when the RPC call is made, it's
>       probably unnecessary.

The AFS server will grant locks based upon the following rules:

  * the caller is granted the PRSFS_LOCK right (Shared lock only)
  * the caller is granted the PRSFS_WRITE right (Shared or Exclusive lock)
  * the caller is the file owner and is granted the PRSFS_INSERT right
    (Shared or Exclusive lock)

The client has enough information to implement a lock permission check 
if there was such an inode operation.

>   (*) acl_permission_check()
>   (*) posix_acl_permission()
>
>       UIDs are part of these ACLs, so no change required.  AFS implements its
>       own ACLs and evaluates them in ->permission() and on the server.
acl_permission_check() and posix_acl_permission() will not be called for 
AFS.  However, it it probably worth adding the vfs_is_owned_by_me() to 
acl_permission_check() in case there is another network filesystem which 
requires non-uid ownership checks and wants to use generic_permission().
>   (*) may_follow_link()
>
>       Should call vfs_is_owned_by_me() and also vfs_have_same_owner() on the
>       the link and its parent dir.  The latter only applies on world-writable
>       sticky dirs.
I agree
>   (*) may_create_in_sticky()
>
>       The initial subject of this patch.  Should call vfs_is_owned_by_me() and
>       also vfs_have_same_owner() both.
I agree.
>   (*) __check_sticky()
>
>       Should call vfs_is_owned_by_me() on both the dir and the inode.
I agree.
>   (*) may_dedupe_file()
>
>       Should call vfs_is_owned_by_me().
I agree.
>
>   (*) IMA policy ops.
>
>       No idea.

I am not familiar with the Integrity Measurement Operations. However, 
looking at the usage of the ima_rule_entry fowner_op and fgroup_op 
operations, I do not believe the proposed vfs_is_owned_by_me() could be 
used to implement fowner_op.  If IMA should work filesystems which 
cannot rely upon local uid comparisons for owner and group, then I think 
the IMA fowner_op and fgroup_op would require an alternative 
implementation.  At the moment, IMA is unlikely to work properly with AFS.


> David

Jeffrey Altman
diff mbox series

Patch

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 9e7b1fe82c27..27e565612bde 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -65,6 +65,7 @@  const struct inode_operations afs_dir_inode_operations = {
 	.permission	= afs_permission,
 	.getattr	= afs_getattr,
 	.setattr	= afs_setattr,
+	.may_create_in_sticky = afs_may_create_in_sticky,
 };
 
 const struct address_space_operations afs_dir_aops = {
diff --git a/fs/afs/file.c b/fs/afs/file.c
index fc15497608c6..dff48d0adec3 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -47,6 +47,7 @@  const struct inode_operations afs_file_inode_operations = {
 	.getattr	= afs_getattr,
 	.setattr	= afs_setattr,
 	.permission	= afs_permission,
+	.may_create_in_sticky = afs_may_create_in_sticky,
 };
 
 const struct address_space_operations afs_file_aops = {
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 440b0e731093..4a5bb01606a8 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1495,6 +1495,8 @@  extern struct key *afs_request_key(struct afs_cell *);
 extern struct key *afs_request_key_rcu(struct afs_cell *);
 extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
 extern int afs_permission(struct mnt_idmap *, struct inode *, int);
+int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode,
+			     struct path *path);
 extern void __exit afs_clean_up_permit_cache(void);
 
 /*
diff --git a/fs/afs/security.c b/fs/afs/security.c
index 6a7744c9e2a2..9fd6e4b5c228 100644
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -477,6 +477,58 @@  int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
 	return ret;
 }
 
+/*
+ * Perform the ownership checks for a file in a sticky directory on AFS.
+ *
+ * In the case of AFS, this means that:
+ *
+ * (1) the file and the directory have the same AFS ownership or
+ *
+ * (2) the file is owned by the AFS user represented by the token (e.g. from a
+ *     kerberos server) held in a key.
+ *
+ * Returns 0 if owned by me or has same owner as parent dir, 1 if not; can also
+ * return an error.
+ */
+int afs_may_create_in_sticky(struct mnt_idmap *idmap, struct inode *inode,
+			     struct path *path)
+{
+	struct afs_vnode *dvnode, *vnode = AFS_FS_I(inode);
+	struct dentry *parent;
+	struct key *key;
+	afs_access_t access;
+	int ret;
+	s64 owner;
+
+	key = afs_request_key(vnode->volume->cell);
+	if (IS_ERR(key))
+		return PTR_ERR(key);
+
+	/* Get the owner's ID for the directory.  Ideally, we'd use RCU to
+	 * access the parent rather than getting a ref.
+	 */
+	parent = dget_parent(path->dentry);
+	dvnode = AFS_FS_I(d_backing_inode(parent));
+	owner = dvnode->status.owner;
+	dput(parent);
+
+	if (vnode->status.owner == owner) {
+		ret = 0;
+		goto error;
+	}
+
+	/* Get the access rights for the key on this file. */
+	ret = afs_check_permit(vnode, key, &access);
+	if (ret < 0)
+		goto error;
+
+	/* We get the ADMINISTER bit if we own the file. */
+	ret = (access & AFS_ACE_ADMINISTER) ? 1 : 0;
+error:
+	key_put(key);
+	return ret;
+}
+
 void __exit afs_clean_up_permit_cache(void)
 {
 	int i;
diff --git a/fs/namei.c b/fs/namei.c
index 84a0e0b0111c..e52c91cbed2a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1316,13 +1316,20 @@  static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
 	if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
 		return 0;
 
-	i_vfsuid = i_uid_into_vfsuid(idmap, inode);
+	if (unlikely(inode->i_op->may_create_in_sticky)) {
+		int ret = inode->i_op->may_create_in_sticky(idmap, inode, &nd->path);
 
-	if (vfsuid_eq(i_vfsuid, dir_vfsuid))
-		return 0;
+		if (ret <= 0) /* 1 if not owned by me or by parent dir. */
+			return ret;
+	} else {
+		i_vfsuid = i_uid_into_vfsuid(idmap, inode);
 
-	if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
-		return 0;
+		if (vfsuid_eq(i_vfsuid, dir_vfsuid))
+			return 0;
+
+		if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
+			return 0;
+	}
 
 	if (likely(dir_mode & 0002)) {
 		audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 016b0fe1536e..11122e169719 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2236,6 +2236,8 @@  struct inode_operations {
 			    struct dentry *dentry, struct fileattr *fa);
 	int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
 	struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
+	int (*may_create_in_sticky)(struct mnt_idmap *idmap, struct inode *inode,
+				    struct path *path);
 } ____cacheline_aligned;
 
 static inline int call_mmap(struct file *file, struct vm_area_struct *vma)