diff mbox series

[RFC,v3,01/19] landlock: Support socket access-control

Message ID 20240904104824.1844082-2-ivanov.mikhail1@huawei-partners.com
State RFC
Headers show
Series Support socket access-control | expand

Commit Message

Mikhail Ivanov Sept. 4, 2024, 10:48 a.m. UTC
Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
fine-grained control of actions for a specific protocol. Any action or
protocol that is not supported by this rule can not be controlled. As a
result, protocols for which fine-grained control is not supported can be
used in a sandboxed system and lead to vulnerabilities or unexpected
behavior.

Controlling the protocols used will allow to use only those that are
necessary for the system and/or which have fine-grained Landlock control
through others types of rules (e.g. TCP bind/connect control with
`LANDLOCK_RULE_NET_PORT`, UNIX bind control with
`LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:

* Server may want to use only TCP sockets for which there is fine-grained
  control of bind(2) and connect(2) actions [1].
* System that does not need a network or that may want to disable network
  for security reasons (e.g. [2]) can achieve this by restricting the use
  of all possible protocols.

This patch implements such control by restricting socket creation in a
sandboxed process.

Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
This rule uses values of address family and socket type (Cf. socket(2))
to determine sockets that should be restricted. This is represented in a
landlock_socket_attr struct:

  struct landlock_socket_attr {
    __u64 allowed_access;
    int family; /* same as domain in socket(2) */
    int type; /* see socket(2) */
  };

Support socket rule storage in landlock ruleset.

Add `LANDLOCK_ACCESS_SOCKET_CREATE` access right that corresponds to the
creation of user space sockets. In the case of connection-based socket
types, this does not restrict the actions that result in creation of
sockets used for messaging between already existing endpoints
(e.g. accept(2), SCTP_SOCKOPT_PEELOFF). Also, this does not restrict any
other socket-related actions such as bind(2) or send(2). All restricted
actions are enlisted in the documentation of this access right.

As with all other access rights, using `LANDLOCK_ACCESS_SOCKET_CREATE`
does not affect the actions on sockets which were created before
sandboxing.

Add socket.c file that will contain socket rules management and hooks.

Implement helper pack_socket_key() to convert 32-bit family and type
alues into uintptr_t. This is possible due to the fact that these
values are limited to AF_MAX (=46), SOCK_MAX (=11) constants. Assumption
is checked in build-time by the helper.

Support socket rules in landlock syscalls. Change ABI version to 6.

[1] https://lore.kernel.org/all/ZJvy2SViorgc+cZI@google.com/
[2] https://cr.yp.to/unix/disablenetwork.html

Closes: https://github.com/landlock-lsm/linux/issues/6
Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---
Changes since v2:
* Refactors access_mask for `LANDLOCK_RULE_SOCKET`.
* Changes type of 'socket_key.packed' from 'uintptr_t' to 'unsigned int'
  in order to fix UB in pack_socket_key().
* Accepts (AF_INET, SOCK_PACKET) as an alias for (AF_PACKET, SOCK_PACKET)
  in landlock_append_socket_rule().
* Fixes documentation.
* Rewrites commit message.
* Fixes grammar.
* Minor fixes.

Changes since v1:
* Reverts landlock_key.data type from u64 to uinptr_t.
* Adds helper to pack domain and type values into uintptr_t.
* Denies inserting socket rule with invalid family and type.
* Renames 'domain' to 'family' in landlock_socket_attr.
* Updates ABI version to 6 since ioctl patches changed it to 5.
* Formats code with clang-format.
* Minor fixes.
---
 include/uapi/linux/landlock.h                | 61 ++++++++++++++++-
 security/landlock/Makefile                   |  2 +-
 security/landlock/limits.h                   |  4 ++
 security/landlock/ruleset.c                  | 33 +++++++++-
 security/landlock/ruleset.h                  | 45 ++++++++++++-
 security/landlock/socket.c                   | 69 ++++++++++++++++++++
 security/landlock/socket.h                   | 17 +++++
 security/landlock/syscalls.c                 | 66 +++++++++++++++++--
 tools/testing/selftests/landlock/base_test.c |  2 +-
 9 files changed, 287 insertions(+), 12 deletions(-)
 create mode 100644 security/landlock/socket.c
 create mode 100644 security/landlock/socket.h

Comments

Günther Noack Sept. 6, 2024, 1:09 p.m. UTC | #1
Hello!

Just a few wording nits and a remark on using maybe u8, u16, u32.

On Wed, Sep 04, 2024 at 06:48:06PM +0800, Mikhail Ivanov wrote:
> Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
> fine-grained control of actions for a specific protocol. Any action or
> protocol that is not supported by this rule can not be controlled. As a
> result, protocols for which fine-grained control is not supported can be
> used in a sandboxed system and lead to vulnerabilities or unexpected
> behavior.
> 
> Controlling the protocols used will allow to use only those that are
> necessary for the system and/or which have fine-grained Landlock control
> through others types of rules (e.g. TCP bind/connect control with
> `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
> `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
> 
> * Server may want to use only TCP sockets for which there is fine-grained
>   control of bind(2) and connect(2) actions [1].
> * System that does not need a network or that may want to disable network
>   for security reasons (e.g. [2]) can achieve this by restricting the use
>   of all possible protocols.
> 
> This patch implements such control by restricting socket creation in a
> sandboxed process.
> 
> Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
> This rule uses values of address family and socket type (Cf. socket(2))
> to determine sockets that should be restricted. This is represented in a
> landlock_socket_attr struct:
> 
>   struct landlock_socket_attr {
>     __u64 allowed_access;
>     int family; /* same as domain in socket(2) */
>     int type; /* see socket(2) */
>   };
> 
> Support socket rule storage in landlock ruleset.
> 
> Add `LANDLOCK_ACCESS_SOCKET_CREATE` access right that corresponds to the
> creation of user space sockets. In the case of connection-based socket
> types, this does not restrict the actions that result in creation of
> sockets used for messaging between already existing endpoints
> (e.g. accept(2), SCTP_SOCKOPT_PEELOFF). Also, this does not restrict any
> other socket-related actions such as bind(2) or send(2). All restricted
> actions are enlisted in the documentation of this access right.
> 
> As with all other access rights, using `LANDLOCK_ACCESS_SOCKET_CREATE`
> does not affect the actions on sockets which were created before
> sandboxing.
> 
> Add socket.c file that will contain socket rules management and hooks.
> 
> Implement helper pack_socket_key() to convert 32-bit family and type
> alues into uintptr_t. This is possible due to the fact that these
  ^^^^^
  values

> values are limited to AF_MAX (=46), SOCK_MAX (=11) constants. Assumption
> is checked in build-time by the helper.
> 
> Support socket rules in landlock syscalls. Change ABI version to 6.
> 
> [1] https://lore.kernel.org/all/ZJvy2SViorgc+cZI@google.com/
> [2] https://cr.yp.to/unix/disablenetwork.html
> 
> Closes: https://github.com/landlock-lsm/linux/issues/6
> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
> ---
> Changes since v2:
> * Refactors access_mask for `LANDLOCK_RULE_SOCKET`.
> * Changes type of 'socket_key.packed' from 'uintptr_t' to 'unsigned int'
>   in order to fix UB in pack_socket_key().
> * Accepts (AF_INET, SOCK_PACKET) as an alias for (AF_PACKET, SOCK_PACKET)
>   in landlock_append_socket_rule().
> * Fixes documentation.
> * Rewrites commit message.
> * Fixes grammar.
> * Minor fixes.
> 
> Changes since v1:
> * Reverts landlock_key.data type from u64 to uinptr_t.
> * Adds helper to pack domain and type values into uintptr_t.
> * Denies inserting socket rule with invalid family and type.
> * Renames 'domain' to 'family' in landlock_socket_attr.
> * Updates ABI version to 6 since ioctl patches changed it to 5.
> * Formats code with clang-format.
> * Minor fixes.
> ---
>  include/uapi/linux/landlock.h                | 61 ++++++++++++++++-
>  security/landlock/Makefile                   |  2 +-
>  security/landlock/limits.h                   |  4 ++
>  security/landlock/ruleset.c                  | 33 +++++++++-
>  security/landlock/ruleset.h                  | 45 ++++++++++++-
>  security/landlock/socket.c                   | 69 ++++++++++++++++++++
>  security/landlock/socket.h                   | 17 +++++
>  security/landlock/syscalls.c                 | 66 +++++++++++++++++--
>  tools/testing/selftests/landlock/base_test.c |  2 +-
>  9 files changed, 287 insertions(+), 12 deletions(-)
>  create mode 100644 security/landlock/socket.c
>  create mode 100644 security/landlock/socket.h
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 2c8dbc74b955..d9da9f2c0640 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -44,6 +44,13 @@ struct landlock_ruleset_attr {
>  	 * flags`_).
>  	 */
>  	__u64 handled_access_net;
> +
> +	/**
> +	 * @handled_access_socket: Bitmask of actions (cf. `Socket flags`_)
> +	 * that is handled by this ruleset and should then be forbidden if no
> +	 * rule explicitly allow them.
> +	 */
> +	__u64 handled_access_socket;
>  };
>  
>  /*
> @@ -72,6 +79,11 @@ enum landlock_rule_type {
>  	 * landlock_net_port_attr .
>  	 */
>  	LANDLOCK_RULE_NET_PORT,
> +	/**
> +	 * @LANDLOCK_RULE_SOCKET: Type of a &struct
> +	 * landlock_socket_attr .
> +	 */
> +	LANDLOCK_RULE_SOCKET,
>  };
>  
>  /**
> @@ -123,6 +135,32 @@ struct landlock_net_port_attr {
>  	__u64 port;
>  };
>  
> +/**
> + * struct landlock_socket_attr - Socket definition
> + *
> + * Argument of sys_landlock_add_rule().
> + */
> +struct landlock_socket_attr {
> +	/**
> +	 * @allowed_access: Bitmask of allowed access for a socket
> +	 * (cf. `Socket flags`_).
> +	 */
> +	__u64 allowed_access;
> +	/**
> +	 * @family: Protocol family used for communication
> +	 * (same as domain in socket(2)).
> +	 *
> +	 * This argument is considered valid if it is in the range [0, AF_MAX).
> +	 */
> +	int family;
> +	/**
> +	 * @type: Socket type (see socket(2)).
> +	 *
> +	 * This argument is considered valid if it is in the range [0, SOCK_MAX).
> +	 */
> +	int type;
> +};
> +
>  /**
>   * DOC: fs_access
>   *
> @@ -259,7 +297,7 @@ struct landlock_net_port_attr {
>   * DOC: net_access
>   *
>   * Network flags
> - * ~~~~~~~~~~~~~~~~
> + * ~~~~~~~~~~~~~
>   *
>   * These flags enable to restrict a sandboxed process to a set of network
>   * actions. This is supported since the Landlock ABI version 4.
> @@ -274,4 +312,25 @@ struct landlock_net_port_attr {
>  #define LANDLOCK_ACCESS_NET_BIND_TCP			(1ULL << 0)
>  #define LANDLOCK_ACCESS_NET_CONNECT_TCP			(1ULL << 1)
>  /* clang-format on */
> +
> +/**
> + * DOC: socket_access
> + *
> + * Socket flags
> + * ~~~~~~~~~~~~
> + *
> + * These flags restrict actions on sockets for a sandboxed process (e.g. socket
> + * creation). Sockets opened before sandboxing are not subject to these
> + * restrictions. This is supported since the Landlock ABI version 6.
> + *
> + * The following access right apply only to sockets:
                                 ^^^^^
				 applies

> + *
> + * - %LANDLOCK_ACCESS_SOCKET_CREATE: Create an user space socket. This access
                                               ^^
					       a

> + *   right restricts following operations:
                       ^ ...*the* following operations:

> + *   * :manpage:`socket(2)`, :manpage:`socketpair(2)`,
> + *   * ``IORING_OP_SOCKET`` io_uring operation (see :manpage:`io_uring_enter(2)`),
> + */
> +/* clang-format off */
> +#define LANDLOCK_ACCESS_SOCKET_CREATE			(1ULL << 0)
> +/* clang-format on */
>  #endif /* _UAPI_LINUX_LANDLOCK_H */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index b4538b7cf7d2..ff1dd98f6a1b 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -1,6 +1,6 @@
>  obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>  
>  landlock-y := setup.o syscalls.o object.o ruleset.o \
> -	cred.o task.o fs.o
> +	cred.o task.o fs.o socket.o
>  
>  landlock-$(CONFIG_INET) += net.o
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index 4eb643077a2a..2c04dca414c7 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -26,6 +26,10 @@
>  #define LANDLOCK_MASK_ACCESS_NET	((LANDLOCK_LAST_ACCESS_NET << 1) - 1)
>  #define LANDLOCK_NUM_ACCESS_NET		__const_hweight64(LANDLOCK_MASK_ACCESS_NET)
>  
> +#define LANDLOCK_LAST_ACCESS_SOCKET	    LANDLOCK_ACCESS_SOCKET_CREATE
> +#define LANDLOCK_MASK_ACCESS_SOCKET	    ((LANDLOCK_LAST_ACCESS_SOCKET << 1) - 1)
> +#define LANDLOCK_NUM_ACCESS_SOCKET		__const_hweight64(LANDLOCK_MASK_ACCESS_SOCKET)
> +
>  /* clang-format on */
>  
>  #endif /* _SECURITY_LANDLOCK_LIMITS_H */
> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
> index 6ff232f58618..9bf5e5e88544 100644
> --- a/security/landlock/ruleset.c
> +++ b/security/landlock/ruleset.c
> @@ -40,6 +40,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>  #if IS_ENABLED(CONFIG_INET)
>  	new_ruleset->root_net_port = RB_ROOT;
>  #endif /* IS_ENABLED(CONFIG_INET) */
> +	new_ruleset->root_socket = RB_ROOT;
>  
>  	new_ruleset->num_layers = num_layers;
>  	/*
> @@ -52,12 +53,13 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>  
>  struct landlock_ruleset *
>  landlock_create_ruleset(const access_mask_t fs_access_mask,
> -			const access_mask_t net_access_mask)
> +			const access_mask_t net_access_mask,
> +			const access_mask_t socket_access_mask)
>  {
>  	struct landlock_ruleset *new_ruleset;
>  
>  	/* Informs about useless ruleset. */
> -	if (!fs_access_mask && !net_access_mask)
> +	if (!fs_access_mask && !net_access_mask && !socket_access_mask)
>  		return ERR_PTR(-ENOMSG);
>  	new_ruleset = create_ruleset(1);
>  	if (IS_ERR(new_ruleset))
> @@ -66,6 +68,9 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
>  		landlock_add_fs_access_mask(new_ruleset, fs_access_mask, 0);
>  	if (net_access_mask)
>  		landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
> +	if (socket_access_mask)
> +		landlock_add_socket_access_mask(new_ruleset, socket_access_mask,
> +						0);
>  	return new_ruleset;
>  }
>  
> @@ -89,6 +94,9 @@ static bool is_object_pointer(const enum landlock_key_type key_type)
>  		return false;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +	case LANDLOCK_KEY_SOCKET:
> +		return false;
> +
>  	default:
>  		WARN_ON_ONCE(1);
>  		return false;
> @@ -146,6 +154,9 @@ static struct rb_root *get_root(struct landlock_ruleset *const ruleset,
>  		return &ruleset->root_net_port;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +	case LANDLOCK_KEY_SOCKET:
> +		return &ruleset->root_socket;
> +
>  	default:
>  		WARN_ON_ONCE(1);
>  		return ERR_PTR(-EINVAL);
> @@ -395,6 +406,11 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
>  		goto out_unlock;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +	/* Merges the @src socket tree. */
> +	err = merge_tree(dst, src, LANDLOCK_KEY_SOCKET);
> +	if (err)
> +		goto out_unlock;
> +
>  out_unlock:
>  	mutex_unlock(&src->lock);
>  	mutex_unlock(&dst->lock);
> @@ -458,6 +474,11 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
>  		goto out_unlock;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +	/* Copies the @parent socket tree. */
> +	err = inherit_tree(parent, child, LANDLOCK_KEY_SOCKET);
> +	if (err)
> +		goto out_unlock;
> +
>  	if (WARN_ON_ONCE(child->num_layers <= parent->num_layers)) {
>  		err = -EINVAL;
>  		goto out_unlock;
> @@ -494,6 +515,10 @@ static void free_ruleset(struct landlock_ruleset *const ruleset)
>  		free_rule(freeme, LANDLOCK_KEY_NET_PORT);
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +	rbtree_postorder_for_each_entry_safe(freeme, next,
> +					     &ruleset->root_socket, node)
> +		free_rule(freeme, LANDLOCK_KEY_SOCKET);
> +
>  	put_hierarchy(ruleset->hierarchy);
>  	kfree(ruleset);
>  }
> @@ -704,6 +729,10 @@ landlock_init_layer_masks(const struct landlock_ruleset *const domain,
>  		break;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +	case LANDLOCK_KEY_SOCKET:
> +		get_access_mask = landlock_get_socket_access_mask;
> +		num_access = LANDLOCK_NUM_ACCESS_SOCKET;
> +		break;
>  	default:
>  		WARN_ON_ONCE(1);
>  		return 0;
> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
> index 0f1b5b4c8f6b..5cf7251e11ca 100644
> --- a/security/landlock/ruleset.h
> +++ b/security/landlock/ruleset.h
> @@ -42,6 +42,7 @@ static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
>  struct access_masks {
>  	access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
>  	access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
> +	access_mask_t socket : LANDLOCK_NUM_ACCESS_SOCKET;
>  };
>  
>  typedef u16 layer_mask_t;
> @@ -92,6 +93,12 @@ enum landlock_key_type {
>  	 * node keys.
>  	 */
>  	LANDLOCK_KEY_NET_PORT,
> +
> +	/**
> +	 * @LANDLOCK_KEY_SOCKET: Type of &landlock_ruleset.root_socket's
> +	 * node keys.
> +	 */
> +	LANDLOCK_KEY_SOCKET,
>  };
>  
>  /**
> @@ -177,6 +184,15 @@ struct landlock_ruleset {
>  	struct rb_root root_net_port;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +	/**
> +	 * @root_socket: Root of a red-black tree containing &struct
> +	 * landlock_rule nodes with socket type, described by (family, type)
> +	 * pair (see socket(2)). Once a ruleset is tied to a
> +	 * process (i.e. as a domain), this tree is immutable until @usage
> +	 * reaches zero.
> +	 */
> +	struct rb_root root_socket;
> +
>  	/**
>  	 * @hierarchy: Enables hierarchy identification even when a parent
>  	 * domain vanishes.  This is needed for the ptrace protection.
> @@ -215,8 +231,10 @@ struct landlock_ruleset {
>  			 */
>  			u32 num_layers;
>  			/**
> -			 * @access_masks: Contains the subset of filesystem and
> -			 * network actions that are restricted by a ruleset.
> +			 * @access_masks: Contains the subset of filesystem,
> +			 * network and socket actions that are restricted by
> +			 * a ruleset.
> +			 *
>  			 * A domain saves all layers of merged rulesets in a
>  			 * stack (FAM), starting from the first layer to the
>  			 * last one.  These layers are used when merging
> @@ -233,7 +251,8 @@ struct landlock_ruleset {
>  
>  struct landlock_ruleset *
>  landlock_create_ruleset(const access_mask_t access_mask_fs,
> -			const access_mask_t access_mask_net);
> +			const access_mask_t access_mask_net,
> +			const access_mask_t access_mask_socket);
>  
>  void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
>  void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
> @@ -280,6 +299,19 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
>  	ruleset->access_masks[layer_level].net |= net_mask;
>  }
>  
> +static inline void
> +landlock_add_socket_access_mask(struct landlock_ruleset *const ruleset,
> +				const access_mask_t socket_access_mask,
> +				const u16 layer_level)
> +{
> +	access_mask_t socket_mask = socket_access_mask &
> +				    LANDLOCK_MASK_ACCESS_SOCKET;
> +
> +	/* Should already be checked in sys_landlock_create_ruleset(). */
> +	WARN_ON_ONCE(socket_access_mask != socket_mask);
> +	ruleset->access_masks[layer_level].socket |= socket_mask;
> +}
> +
>  static inline access_mask_t
>  landlock_get_raw_fs_access_mask(const struct landlock_ruleset *const ruleset,
>  				const u16 layer_level)
> @@ -303,6 +335,13 @@ landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
>  	return ruleset->access_masks[layer_level].net;
>  }
>  
> +static inline access_mask_t
> +landlock_get_socket_access_mask(const struct landlock_ruleset *const ruleset,
> +				const u16 layer_level)
> +{
> +	return ruleset->access_masks[layer_level].socket;
> +}
> +
>  bool landlock_unmask_layers(const struct landlock_rule *const rule,
>  			    const access_mask_t access_request,
>  			    layer_mask_t (*const layer_masks)[],
> diff --git a/security/landlock/socket.c b/security/landlock/socket.c
> new file mode 100644
> index 000000000000..cad89bb91678
> --- /dev/null
> +++ b/security/landlock/socket.c
> @@ -0,0 +1,69 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock LSM - Socket management and hooks
> + *
> + * Copyright © 2024 Huawei Tech. Co., Ltd.
> + */
> +
> +#include <linux/net.h>
> +#include <linux/socket.h>
> +#include <linux/stddef.h>
> +
> +#include "limits.h"
> +#include "ruleset.h"
> +#include "socket.h"
> +
> +static uintptr_t pack_socket_key(const int family, const int type)
> +{
> +	union {
> +		struct {
> +			unsigned short family, type;
> +		} __packed data;
> +		unsigned int packed;
> +	} socket_key;

Maybe a slightly more obvious way would be to use the u8, u16 and u32 types
here?  Then it would be more directly visible that we have considered this
correctly and that not one of the variables has an odd size on an obscure
platform somewhere.

> +
> +	/*
> +	 * Checks that all supported socket families and types can be stored
> +	 * in socket_key.
> +	 */
> +	BUILD_BUG_ON(AF_MAX >= (typeof(socket_key.data.family))~0);
> +	BUILD_BUG_ON(SOCK_MAX >= (typeof(socket_key.data.type))~0);
> +
> +	/* Checks that socket_key can be stored in landlock_key. */
> +	BUILD_BUG_ON(sizeof(socket_key.data) > sizeof(socket_key.packed));
> +	BUILD_BUG_ON(sizeof(socket_key.packed) >
> +		     sizeof_field(union landlock_key, data));
> +
> +	socket_key.data.family = (unsigned short)family;
> +	socket_key.data.type = (unsigned short)type;
> +
> +	return socket_key.packed;
> +}

—Günther
Mikhail Ivanov Sept. 9, 2024, 7:23 a.m. UTC | #2
On 9/6/2024 4:09 PM, Günther Noack wrote:
> Hello!
> 
> Just a few wording nits and a remark on using maybe u8, u16, u32.
> 
> On Wed, Sep 04, 2024 at 06:48:06PM +0800, Mikhail Ivanov wrote:
>> Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
>> fine-grained control of actions for a specific protocol. Any action or
>> protocol that is not supported by this rule can not be controlled. As a
>> result, protocols for which fine-grained control is not supported can be
>> used in a sandboxed system and lead to vulnerabilities or unexpected
>> behavior.
>>
>> Controlling the protocols used will allow to use only those that are
>> necessary for the system and/or which have fine-grained Landlock control
>> through others types of rules (e.g. TCP bind/connect control with
>> `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
>> `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
>>
>> * Server may want to use only TCP sockets for which there is fine-grained
>>    control of bind(2) and connect(2) actions [1].
>> * System that does not need a network or that may want to disable network
>>    for security reasons (e.g. [2]) can achieve this by restricting the use
>>    of all possible protocols.
>>
>> This patch implements such control by restricting socket creation in a
>> sandboxed process.
>>
>> Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
>> This rule uses values of address family and socket type (Cf. socket(2))
>> to determine sockets that should be restricted. This is represented in a
>> landlock_socket_attr struct:
>>
>>    struct landlock_socket_attr {
>>      __u64 allowed_access;
>>      int family; /* same as domain in socket(2) */
>>      int type; /* see socket(2) */
>>    };
>>
>> Support socket rule storage in landlock ruleset.
>>
>> Add `LANDLOCK_ACCESS_SOCKET_CREATE` access right that corresponds to the
>> creation of user space sockets. In the case of connection-based socket
>> types, this does not restrict the actions that result in creation of
>> sockets used for messaging between already existing endpoints
>> (e.g. accept(2), SCTP_SOCKOPT_PEELOFF). Also, this does not restrict any
>> other socket-related actions such as bind(2) or send(2). All restricted
>> actions are enlisted in the documentation of this access right.
>>
>> As with all other access rights, using `LANDLOCK_ACCESS_SOCKET_CREATE`
>> does not affect the actions on sockets which were created before
>> sandboxing.
>>
>> Add socket.c file that will contain socket rules management and hooks.
>>
>> Implement helper pack_socket_key() to convert 32-bit family and type
>> alues into uintptr_t. This is possible due to the fact that these
>    ^^^^^
>    values

thanks! Will be fixed

> 
>> values are limited to AF_MAX (=46), SOCK_MAX (=11) constants. Assumption
>> is checked in build-time by the helper.
>>
>> Support socket rules in landlock syscalls. Change ABI version to 6.
>>
>> [1] https://lore.kernel.org/all/ZJvy2SViorgc+cZI@google.com/
>> [2] https://cr.yp.to/unix/disablenetwork.html
>>
>> Closes: https://github.com/landlock-lsm/linux/issues/6
>> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
>> ---
>> Changes since v2:
>> * Refactors access_mask for `LANDLOCK_RULE_SOCKET`.
>> * Changes type of 'socket_key.packed' from 'uintptr_t' to 'unsigned int'
>>    in order to fix UB in pack_socket_key().
>> * Accepts (AF_INET, SOCK_PACKET) as an alias for (AF_PACKET, SOCK_PACKET)
>>    in landlock_append_socket_rule().
>> * Fixes documentation.
>> * Rewrites commit message.
>> * Fixes grammar.
>> * Minor fixes.
>>
>> Changes since v1:
>> * Reverts landlock_key.data type from u64 to uinptr_t.
>> * Adds helper to pack domain and type values into uintptr_t.
>> * Denies inserting socket rule with invalid family and type.
>> * Renames 'domain' to 'family' in landlock_socket_attr.
>> * Updates ABI version to 6 since ioctl patches changed it to 5.
>> * Formats code with clang-format.
>> * Minor fixes.
>> ---
>>   include/uapi/linux/landlock.h                | 61 ++++++++++++++++-
>>   security/landlock/Makefile                   |  2 +-
>>   security/landlock/limits.h                   |  4 ++
>>   security/landlock/ruleset.c                  | 33 +++++++++-
>>   security/landlock/ruleset.h                  | 45 ++++++++++++-
>>   security/landlock/socket.c                   | 69 ++++++++++++++++++++
>>   security/landlock/socket.h                   | 17 +++++
>>   security/landlock/syscalls.c                 | 66 +++++++++++++++++--
>>   tools/testing/selftests/landlock/base_test.c |  2 +-
>>   9 files changed, 287 insertions(+), 12 deletions(-)
>>   create mode 100644 security/landlock/socket.c
>>   create mode 100644 security/landlock/socket.h
>>
>> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
>> index 2c8dbc74b955..d9da9f2c0640 100644
>> --- a/include/uapi/linux/landlock.h
>> +++ b/include/uapi/linux/landlock.h
>> @@ -44,6 +44,13 @@ struct landlock_ruleset_attr {
>>   	 * flags`_).
>>   	 */
>>   	__u64 handled_access_net;
>> +
>> +	/**
>> +	 * @handled_access_socket: Bitmask of actions (cf. `Socket flags`_)
>> +	 * that is handled by this ruleset and should then be forbidden if no
>> +	 * rule explicitly allow them.
>> +	 */
>> +	__u64 handled_access_socket;
>>   };
>>   
>>   /*
>> @@ -72,6 +79,11 @@ enum landlock_rule_type {
>>   	 * landlock_net_port_attr .
>>   	 */
>>   	LANDLOCK_RULE_NET_PORT,
>> +	/**
>> +	 * @LANDLOCK_RULE_SOCKET: Type of a &struct
>> +	 * landlock_socket_attr .
>> +	 */
>> +	LANDLOCK_RULE_SOCKET,
>>   };
>>   
>>   /**
>> @@ -123,6 +135,32 @@ struct landlock_net_port_attr {
>>   	__u64 port;
>>   };
>>   
>> +/**
>> + * struct landlock_socket_attr - Socket definition
>> + *
>> + * Argument of sys_landlock_add_rule().
>> + */
>> +struct landlock_socket_attr {
>> +	/**
>> +	 * @allowed_access: Bitmask of allowed access for a socket
>> +	 * (cf. `Socket flags`_).
>> +	 */
>> +	__u64 allowed_access;
>> +	/**
>> +	 * @family: Protocol family used for communication
>> +	 * (same as domain in socket(2)).
>> +	 *
>> +	 * This argument is considered valid if it is in the range [0, AF_MAX).
>> +	 */
>> +	int family;
>> +	/**
>> +	 * @type: Socket type (see socket(2)).
>> +	 *
>> +	 * This argument is considered valid if it is in the range [0, SOCK_MAX).
>> +	 */
>> +	int type;
>> +};
>> +
>>   /**
>>    * DOC: fs_access
>>    *
>> @@ -259,7 +297,7 @@ struct landlock_net_port_attr {
>>    * DOC: net_access
>>    *
>>    * Network flags
>> - * ~~~~~~~~~~~~~~~~
>> + * ~~~~~~~~~~~~~
>>    *
>>    * These flags enable to restrict a sandboxed process to a set of network
>>    * actions. This is supported since the Landlock ABI version 4.
>> @@ -274,4 +312,25 @@ struct landlock_net_port_attr {
>>   #define LANDLOCK_ACCESS_NET_BIND_TCP			(1ULL << 0)
>>   #define LANDLOCK_ACCESS_NET_CONNECT_TCP			(1ULL << 1)
>>   /* clang-format on */
>> +
>> +/**
>> + * DOC: socket_access
>> + *
>> + * Socket flags
>> + * ~~~~~~~~~~~~
>> + *
>> + * These flags restrict actions on sockets for a sandboxed process (e.g. socket
>> + * creation). Sockets opened before sandboxing are not subject to these
>> + * restrictions. This is supported since the Landlock ABI version 6.
>> + *
>> + * The following access right apply only to sockets:
>                                   ^^^^^
> 				 applies

Thank you! will be fixed

> 
>> + *
>> + * - %LANDLOCK_ACCESS_SOCKET_CREATE: Create an user space socket. This access
>                                                 ^^
> 					       a

will be fixed

> 
>> + *   right restricts following operations:
>                         ^ ...*the* following operations:

will be fixed

> 
>> + *   * :manpage:`socket(2)`, :manpage:`socketpair(2)`,
>> + *   * ``IORING_OP_SOCKET`` io_uring operation (see :manpage:`io_uring_enter(2)`),
>> + */
>> +/* clang-format off */
>> +#define LANDLOCK_ACCESS_SOCKET_CREATE			(1ULL << 0)
>> +/* clang-format on */
>>   #endif /* _UAPI_LINUX_LANDLOCK_H */
>> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
>> index b4538b7cf7d2..ff1dd98f6a1b 100644
>> --- a/security/landlock/Makefile
>> +++ b/security/landlock/Makefile
>> @@ -1,6 +1,6 @@
>>   obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>>   
>>   landlock-y := setup.o syscalls.o object.o ruleset.o \
>> -	cred.o task.o fs.o
>> +	cred.o task.o fs.o socket.o
>>   
>>   landlock-$(CONFIG_INET) += net.o
>> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
>> index 4eb643077a2a..2c04dca414c7 100644
>> --- a/security/landlock/limits.h
>> +++ b/security/landlock/limits.h
>> @@ -26,6 +26,10 @@
>>   #define LANDLOCK_MASK_ACCESS_NET	((LANDLOCK_LAST_ACCESS_NET << 1) - 1)
>>   #define LANDLOCK_NUM_ACCESS_NET		__const_hweight64(LANDLOCK_MASK_ACCESS_NET)
>>   
>> +#define LANDLOCK_LAST_ACCESS_SOCKET	    LANDLOCK_ACCESS_SOCKET_CREATE
>> +#define LANDLOCK_MASK_ACCESS_SOCKET	    ((LANDLOCK_LAST_ACCESS_SOCKET << 1) - 1)
>> +#define LANDLOCK_NUM_ACCESS_SOCKET		__const_hweight64(LANDLOCK_MASK_ACCESS_SOCKET)
>> +
>>   /* clang-format on */
>>   
>>   #endif /* _SECURITY_LANDLOCK_LIMITS_H */
>> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
>> index 6ff232f58618..9bf5e5e88544 100644
>> --- a/security/landlock/ruleset.c
>> +++ b/security/landlock/ruleset.c
>> @@ -40,6 +40,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>>   #if IS_ENABLED(CONFIG_INET)
>>   	new_ruleset->root_net_port = RB_ROOT;
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>> +	new_ruleset->root_socket = RB_ROOT;
>>   
>>   	new_ruleset->num_layers = num_layers;
>>   	/*
>> @@ -52,12 +53,13 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>>   
>>   struct landlock_ruleset *
>>   landlock_create_ruleset(const access_mask_t fs_access_mask,
>> -			const access_mask_t net_access_mask)
>> +			const access_mask_t net_access_mask,
>> +			const access_mask_t socket_access_mask)
>>   {
>>   	struct landlock_ruleset *new_ruleset;
>>   
>>   	/* Informs about useless ruleset. */
>> -	if (!fs_access_mask && !net_access_mask)
>> +	if (!fs_access_mask && !net_access_mask && !socket_access_mask)
>>   		return ERR_PTR(-ENOMSG);
>>   	new_ruleset = create_ruleset(1);
>>   	if (IS_ERR(new_ruleset))
>> @@ -66,6 +68,9 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
>>   		landlock_add_fs_access_mask(new_ruleset, fs_access_mask, 0);
>>   	if (net_access_mask)
>>   		landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
>> +	if (socket_access_mask)
>> +		landlock_add_socket_access_mask(new_ruleset, socket_access_mask,
>> +						0);
>>   	return new_ruleset;
>>   }
>>   
>> @@ -89,6 +94,9 @@ static bool is_object_pointer(const enum landlock_key_type key_type)
>>   		return false;
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>>   
>> +	case LANDLOCK_KEY_SOCKET:
>> +		return false;
>> +
>>   	default:
>>   		WARN_ON_ONCE(1);
>>   		return false;
>> @@ -146,6 +154,9 @@ static struct rb_root *get_root(struct landlock_ruleset *const ruleset,
>>   		return &ruleset->root_net_port;
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>>   
>> +	case LANDLOCK_KEY_SOCKET:
>> +		return &ruleset->root_socket;
>> +
>>   	default:
>>   		WARN_ON_ONCE(1);
>>   		return ERR_PTR(-EINVAL);
>> @@ -395,6 +406,11 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
>>   		goto out_unlock;
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>>   
>> +	/* Merges the @src socket tree. */
>> +	err = merge_tree(dst, src, LANDLOCK_KEY_SOCKET);
>> +	if (err)
>> +		goto out_unlock;
>> +
>>   out_unlock:
>>   	mutex_unlock(&src->lock);
>>   	mutex_unlock(&dst->lock);
>> @@ -458,6 +474,11 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
>>   		goto out_unlock;
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>>   
>> +	/* Copies the @parent socket tree. */
>> +	err = inherit_tree(parent, child, LANDLOCK_KEY_SOCKET);
>> +	if (err)
>> +		goto out_unlock;
>> +
>>   	if (WARN_ON_ONCE(child->num_layers <= parent->num_layers)) {
>>   		err = -EINVAL;
>>   		goto out_unlock;
>> @@ -494,6 +515,10 @@ static void free_ruleset(struct landlock_ruleset *const ruleset)
>>   		free_rule(freeme, LANDLOCK_KEY_NET_PORT);
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>>   
>> +	rbtree_postorder_for_each_entry_safe(freeme, next,
>> +					     &ruleset->root_socket, node)
>> +		free_rule(freeme, LANDLOCK_KEY_SOCKET);
>> +
>>   	put_hierarchy(ruleset->hierarchy);
>>   	kfree(ruleset);
>>   }
>> @@ -704,6 +729,10 @@ landlock_init_layer_masks(const struct landlock_ruleset *const domain,
>>   		break;
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>>   
>> +	case LANDLOCK_KEY_SOCKET:
>> +		get_access_mask = landlock_get_socket_access_mask;
>> +		num_access = LANDLOCK_NUM_ACCESS_SOCKET;
>> +		break;
>>   	default:
>>   		WARN_ON_ONCE(1);
>>   		return 0;
>> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
>> index 0f1b5b4c8f6b..5cf7251e11ca 100644
>> --- a/security/landlock/ruleset.h
>> +++ b/security/landlock/ruleset.h
>> @@ -42,6 +42,7 @@ static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
>>   struct access_masks {
>>   	access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
>>   	access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
>> +	access_mask_t socket : LANDLOCK_NUM_ACCESS_SOCKET;
>>   };
>>   
>>   typedef u16 layer_mask_t;
>> @@ -92,6 +93,12 @@ enum landlock_key_type {
>>   	 * node keys.
>>   	 */
>>   	LANDLOCK_KEY_NET_PORT,
>> +
>> +	/**
>> +	 * @LANDLOCK_KEY_SOCKET: Type of &landlock_ruleset.root_socket's
>> +	 * node keys.
>> +	 */
>> +	LANDLOCK_KEY_SOCKET,
>>   };
>>   
>>   /**
>> @@ -177,6 +184,15 @@ struct landlock_ruleset {
>>   	struct rb_root root_net_port;
>>   #endif /* IS_ENABLED(CONFIG_INET) */
>>   
>> +	/**
>> +	 * @root_socket: Root of a red-black tree containing &struct
>> +	 * landlock_rule nodes with socket type, described by (family, type)
>> +	 * pair (see socket(2)). Once a ruleset is tied to a
>> +	 * process (i.e. as a domain), this tree is immutable until @usage
>> +	 * reaches zero.
>> +	 */
>> +	struct rb_root root_socket;
>> +
>>   	/**
>>   	 * @hierarchy: Enables hierarchy identification even when a parent
>>   	 * domain vanishes.  This is needed for the ptrace protection.
>> @@ -215,8 +231,10 @@ struct landlock_ruleset {
>>   			 */
>>   			u32 num_layers;
>>   			/**
>> -			 * @access_masks: Contains the subset of filesystem and
>> -			 * network actions that are restricted by a ruleset.
>> +			 * @access_masks: Contains the subset of filesystem,
>> +			 * network and socket actions that are restricted by
>> +			 * a ruleset.
>> +			 *
>>   			 * A domain saves all layers of merged rulesets in a
>>   			 * stack (FAM), starting from the first layer to the
>>   			 * last one.  These layers are used when merging
>> @@ -233,7 +251,8 @@ struct landlock_ruleset {
>>   
>>   struct landlock_ruleset *
>>   landlock_create_ruleset(const access_mask_t access_mask_fs,
>> -			const access_mask_t access_mask_net);
>> +			const access_mask_t access_mask_net,
>> +			const access_mask_t access_mask_socket);
>>   
>>   void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
>>   void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
>> @@ -280,6 +299,19 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
>>   	ruleset->access_masks[layer_level].net |= net_mask;
>>   }
>>   
>> +static inline void
>> +landlock_add_socket_access_mask(struct landlock_ruleset *const ruleset,
>> +				const access_mask_t socket_access_mask,
>> +				const u16 layer_level)
>> +{
>> +	access_mask_t socket_mask = socket_access_mask &
>> +				    LANDLOCK_MASK_ACCESS_SOCKET;
>> +
>> +	/* Should already be checked in sys_landlock_create_ruleset(). */
>> +	WARN_ON_ONCE(socket_access_mask != socket_mask);
>> +	ruleset->access_masks[layer_level].socket |= socket_mask;
>> +}
>> +
>>   static inline access_mask_t
>>   landlock_get_raw_fs_access_mask(const struct landlock_ruleset *const ruleset,
>>   				const u16 layer_level)
>> @@ -303,6 +335,13 @@ landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
>>   	return ruleset->access_masks[layer_level].net;
>>   }
>>   
>> +static inline access_mask_t
>> +landlock_get_socket_access_mask(const struct landlock_ruleset *const ruleset,
>> +				const u16 layer_level)
>> +{
>> +	return ruleset->access_masks[layer_level].socket;
>> +}
>> +
>>   bool landlock_unmask_layers(const struct landlock_rule *const rule,
>>   			    const access_mask_t access_request,
>>   			    layer_mask_t (*const layer_masks)[],
>> diff --git a/security/landlock/socket.c b/security/landlock/socket.c
>> new file mode 100644
>> index 000000000000..cad89bb91678
>> --- /dev/null
>> +++ b/security/landlock/socket.c
>> @@ -0,0 +1,69 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Landlock LSM - Socket management and hooks
>> + *
>> + * Copyright © 2024 Huawei Tech. Co., Ltd.
>> + */
>> +
>> +#include <linux/net.h>
>> +#include <linux/socket.h>
>> +#include <linux/stddef.h>
>> +
>> +#include "limits.h"
>> +#include "ruleset.h"
>> +#include "socket.h"
>> +
>> +static uintptr_t pack_socket_key(const int family, const int type)
>> +{
>> +	union {
>> +		struct {
>> +			unsigned short family, type;
>> +		} __packed data;
>> +		unsigned int packed;
>> +	} socket_key;
> 
> Maybe a slightly more obvious way would be to use the u8, u16 and u32 types
> here?  Then it would be more directly visible that we have considered this
> correctly and that not one of the variables has an odd size on an obscure
> platform somewhere.

Agreed, thank you for the suggestion!

> 
>> +
>> +	/*
>> +	 * Checks that all supported socket families and types can be stored
>> +	 * in socket_key.
>> +	 */
>> +	BUILD_BUG_ON(AF_MAX >= (typeof(socket_key.data.family))~0);
>> +	BUILD_BUG_ON(SOCK_MAX >= (typeof(socket_key.data.type))~0);
>> +
>> +	/* Checks that socket_key can be stored in landlock_key. */
>> +	BUILD_BUG_ON(sizeof(socket_key.data) > sizeof(socket_key.packed));
>> +	BUILD_BUG_ON(sizeof(socket_key.packed) >
>> +		     sizeof_field(union landlock_key, data));
>> +
>> +	socket_key.data.family = (unsigned short)family;
>> +	socket_key.data.type = (unsigned short)type;
>> +
>> +	return socket_key.packed;
>> +}
> 
> —Günther
Mikhail Ivanov Nov. 11, 2024, 4:29 p.m. UTC | #3
On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
> Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
> fine-grained control of actions for a specific protocol. Any action or
> protocol that is not supported by this rule can not be controlled. As a
> result, protocols for which fine-grained control is not supported can be
> used in a sandboxed system and lead to vulnerabilities or unexpected
> behavior.
> 
> Controlling the protocols used will allow to use only those that are
> necessary for the system and/or which have fine-grained Landlock control
> through others types of rules (e.g. TCP bind/connect control with
> `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
> `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
> 
> * Server may want to use only TCP sockets for which there is fine-grained
>    control of bind(2) and connect(2) actions [1].
> * System that does not need a network or that may want to disable network
>    for security reasons (e.g. [2]) can achieve this by restricting the use
>    of all possible protocols.
> 
> This patch implements such control by restricting socket creation in a
> sandboxed process.
> 
> Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
> This rule uses values of address family and socket type (Cf. socket(2))
> to determine sockets that should be restricted. This is represented in a
> landlock_socket_attr struct:
> 
>    struct landlock_socket_attr {
>      __u64 allowed_access;
>      int family; /* same as domain in socket(2) */
>      int type; /* see socket(2) */
>    };

Hello! I'd like to consider another approach to define this structure
before sending the next version of this patchset.

Currently, it has following possible issues:

First of all, there is a lack of protocol granularity. It's impossible
to (for example) deny creation of ICMP and SCTP sockets and allow TCP
and UDP. Since the values of address family and socket type do not
completely define the protocol for the restriction, we may gain
incomplete control of the network actions. AFAICS, this is limited to
only a couple of IP protocol cases (e.g. it's impossible to deny SCTP
and SMC sockets to only allow TCP, deny ICMP and allow UDP).

But one of the main advantages of socket access rights is the ability to
allow only those protocols for which there is a fine-grained control
over their actions (TCP bind/connect). It can be inconvenient
(and unsafe) for SCTP to be unrestricted, while sandboxed process only
needs TCP sockets.

Adding protocol (Cf. socket(2)) field was considered a bit during the
initial discussion:
https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/

Secondly, I'm not really sure if socket type granularity is required
for most of the protocols. It may be more convenient for the end user
to be able to completely restrict the address family without specifying
whether restriction is dedicated to stream or dgram sockets (e.g. for
BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
current design, since address family can be restricted by specifying
type = SOCK_TYPE_MASK.

I suggest implementing something close to selinux socket classes for the
struct landlock_socket_attr (Cf. socket_type_to_security_class()). This
will provide protocol granularity and may be simpler and more convenient
in the terms of determining access rights. WDYT?
Günther Noack Nov. 22, 2024, 5:45 p.m. UTC | #4
Hello Mikhail,

sorry for the delayed response;
I am very happy to see activity on this patch set! :)

On Mon, Nov 11, 2024 at 07:29:49PM +0300, Mikhail Ivanov wrote:
> On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
> > Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
> > fine-grained control of actions for a specific protocol. Any action or
> > protocol that is not supported by this rule can not be controlled. As a
> > result, protocols for which fine-grained control is not supported can be
> > used in a sandboxed system and lead to vulnerabilities or unexpected
> > behavior.
> > 
> > Controlling the protocols used will allow to use only those that are
> > necessary for the system and/or which have fine-grained Landlock control
> > through others types of rules (e.g. TCP bind/connect control with
> > `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
> > `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
> > 
> > * Server may want to use only TCP sockets for which there is fine-grained
> >    control of bind(2) and connect(2) actions [1].
> > * System that does not need a network or that may want to disable network
> >    for security reasons (e.g. [2]) can achieve this by restricting the use
> >    of all possible protocols.
> > 
> > This patch implements such control by restricting socket creation in a
> > sandboxed process.
> > 
> > Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
> > This rule uses values of address family and socket type (Cf. socket(2))
> > to determine sockets that should be restricted. This is represented in a
> > landlock_socket_attr struct:
> > 
> >    struct landlock_socket_attr {
> >      __u64 allowed_access;
> >      int family; /* same as domain in socket(2) */
> >      int type; /* see socket(2) */
> >    };
> 
> Hello! I'd like to consider another approach to define this structure
> before sending the next version of this patchset.
> 
> Currently, it has following possible issues:
> 
> First of all, there is a lack of protocol granularity. It's impossible
> to (for example) deny creation of ICMP and SCTP sockets and allow TCP
> and UDP. Since the values of address family and socket type do not
> completely define the protocol for the restriction, we may gain
> incomplete control of the network actions. AFAICS, this is limited to
> only a couple of IP protocol cases (e.g. it's impossible to deny SCTP
> and SMC sockets to only allow TCP, deny ICMP and allow UDP).
> 
> But one of the main advantages of socket access rights is the ability to
> allow only those protocols for which there is a fine-grained control
> over their actions (TCP bind/connect). It can be inconvenient
> (and unsafe) for SCTP to be unrestricted, while sandboxed process only
> needs TCP sockets.

That is a good observation which I had missed.

I agree with your analysis, I also see the main use case of socket()
restrictions in:

 (a) restricting socket creating altogether
 (b) only permitting socket types for which there is fine grained control

and I also agree that it would be very surprising when the same socket types
that provide fine grained control would also open the door for unrestricted
access to SMC, SCTP or other protocols.  We should instead strive for a
socket() access control with which these additional protocols weren't
accessible.


> Adding protocol (Cf. socket(2)) field was considered a bit during the
> initial discussion:
> https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/

So adding "protocol" to the rule attributes would suffice to restrict the use of
SMC and SCTP then?  (Sorry, I lost context on these protocols a bit in the
meantime, I was so far under the impression that these were using different
values for family and type than TCP and UDP do.)


> Secondly, I'm not really sure if socket type granularity is required
> for most of the protocols. It may be more convenient for the end user
> to be able to completely restrict the address family without specifying
> whether restriction is dedicated to stream or dgram sockets (e.g. for
> BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
> current design, since address family can be restricted by specifying
> type = SOCK_TYPE_MASK.

Whether the user is adding one rule to permit AF_INET+*, or whether the user is
adding two rules to permit (1) AF_INET+SOCK_STREAM and (2) AF_INET+SOCK_DGRAM,
that does not seem like a big deal to me as long as the list of such
combinations is so low?


> I suggest implementing something close to selinux socket classes for the
> struct landlock_socket_attr (Cf. socket_type_to_security_class()). This
> will provide protocol granularity and may be simpler and more convenient
> in the terms of determining access rights. WDYT?

I see that this is a longer switch statement that maps to this enum, it would be
an additional data table that would have to be documented separately for users.

Do you have an example for how such a "security class enum" would map to the
combinations of family, type and socket for the protocols discussed above?

If this is just a matter of actually mapping (family, type, protocol)
combinations in a more flexible way, could we get away by allowing a special
"wildcard" value for the "protocol" field, when it is used within a ruleset?
Then the LSM would have to look up whether there is a rule for (family, type,
protocol) and the only change would be that it now needs to also check whether
there is a rule for (family, type, *)?

—Günther
Mikhail Ivanov Nov. 25, 2024, 11:04 a.m. UTC | #5
On 11/22/2024 8:45 PM, Günther Noack wrote:
> Hello Mikhail,
> 
> sorry for the delayed response;
> I am very happy to see activity on this patch set! :)

Hello Günther,
No problem, thanks a lot for your feedback!

> 
> On Mon, Nov 11, 2024 at 07:29:49PM +0300, Mikhail Ivanov wrote:
>> On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
>>> Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
>>> fine-grained control of actions for a specific protocol. Any action or
>>> protocol that is not supported by this rule can not be controlled. As a
>>> result, protocols for which fine-grained control is not supported can be
>>> used in a sandboxed system and lead to vulnerabilities or unexpected
>>> behavior.
>>>
>>> Controlling the protocols used will allow to use only those that are
>>> necessary for the system and/or which have fine-grained Landlock control
>>> through others types of rules (e.g. TCP bind/connect control with
>>> `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
>>> `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
>>>
>>> * Server may want to use only TCP sockets for which there is fine-grained
>>>     control of bind(2) and connect(2) actions [1].
>>> * System that does not need a network or that may want to disable network
>>>     for security reasons (e.g. [2]) can achieve this by restricting the use
>>>     of all possible protocols.
>>>
>>> This patch implements such control by restricting socket creation in a
>>> sandboxed process.
>>>
>>> Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
>>> This rule uses values of address family and socket type (Cf. socket(2))
>>> to determine sockets that should be restricted. This is represented in a
>>> landlock_socket_attr struct:
>>>
>>>     struct landlock_socket_attr {
>>>       __u64 allowed_access;
>>>       int family; /* same as domain in socket(2) */
>>>       int type; /* see socket(2) */
>>>     };
>>
>> Hello! I'd like to consider another approach to define this structure
>> before sending the next version of this patchset.
>>
>> Currently, it has following possible issues:
>>
>> First of all, there is a lack of protocol granularity. It's impossible
>> to (for example) deny creation of ICMP and SCTP sockets and allow TCP
>> and UDP. Since the values of address family and socket type do not
>> completely define the protocol for the restriction, we may gain
>> incomplete control of the network actions. AFAICS, this is limited to
>> only a couple of IP protocol cases (e.g. it's impossible to deny SCTP
>> and SMC sockets to only allow TCP, deny ICMP and allow UDP).
>>
>> But one of the main advantages of socket access rights is the ability to
>> allow only those protocols for which there is a fine-grained control
>> over their actions (TCP bind/connect). It can be inconvenient
>> (and unsafe) for SCTP to be unrestricted, while sandboxed process only
>> needs TCP sockets.
> 
> That is a good observation which I had missed.
> 
> I agree with your analysis, I also see the main use case of socket()
> restrictions in:
> 
>   (a) restricting socket creating altogether
>   (b) only permitting socket types for which there is fine grained control
> 
> and I also agree that it would be very surprising when the same socket types
> that provide fine grained control would also open the door for unrestricted
> access to SMC, SCTP or other protocols.  We should instead strive for a
> socket() access control with which these additional protocols weren't
> accessible.
> 
> 
>> Adding protocol (Cf. socket(2)) field was considered a bit during the
>> initial discussion:
>> https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/
> 
> So adding "protocol" to the rule attributes would suffice to restrict the use of
> SMC and SCTP then?  (Sorry, I lost context on these protocols a bit in the
> meantime, I was so far under the impression that these were using different
> values for family and type than TCP and UDP do.)

Yeap. Following rule will be enough to allow TCP sockets only:

const struct landlock_socket_attr create_socket_attr = {
	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
	.family = AF_INET{,6},
	.type = SOCK_STREAM,
	.protocol = 0
};

Btw, creation of SMC sockets via IP stack was added quite recently.
So far, creation has been possible only with AF_SMC family.

https://lore.kernel.org/all/1718301630-63692-1-git-send-email-alibuda@linux.alibaba.com/

> 
> 
>> Secondly, I'm not really sure if socket type granularity is required
>> for most of the protocols. It may be more convenient for the end user
>> to be able to completely restrict the address family without specifying
>> whether restriction is dedicated to stream or dgram sockets (e.g. for
>> BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
>> current design, since address family can be restricted by specifying
>> type = SOCK_TYPE_MASK.
> 
> Whether the user is adding one rule to permit AF_INET+*, or whether the user is
> adding two rules to permit (1) AF_INET+SOCK_STREAM and (2) AF_INET+SOCK_DGRAM,
> that does not seem like a big deal to me as long as the list of such
> combinations is so low?

Agreed

> 
> 
>> I suggest implementing something close to selinux socket classes for the
>> struct landlock_socket_attr (Cf. socket_type_to_security_class()). This
>> will provide protocol granularity and may be simpler and more convenient
>> in the terms of determining access rights. WDYT?
> 
> I see that this is a longer switch statement that maps to this enum, it would be
> an additional data table that would have to be documented separately for users.

This table is the general drawback, since it makes API a bit more
complex.

> 
> Do you have an example for how such a "security class enum" would map to the
> combinations of family, type and socket for the protocols discussed above?

I think the socket_type_to_security_class() has a pretty good mapping
for UNIX and IP families.

> 
> If this is just a matter of actually mapping (family, type, protocol)
> combinations in a more flexible way, could we get away by allowing a special
> "wildcard" value for the "protocol" field, when it is used within a ruleset?
> Then the LSM would have to look up whether there is a rule for (family, type,
> protocol) and the only change would be that it now needs to also check whether
> there is a rule for (family, type, *)?

Something like this?

const struct landlock_socket_attr create_socket_attr = {
	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
	.family = AF_INET6,
	.type = SOCK_DGRAM,
	.protocol = LANDLOCK_SOCKET_PROTO_ALL
};

> 
> —Günther
Mickaël Salaün Nov. 27, 2024, 6:43 p.m. UTC | #6
On Mon, Nov 25, 2024 at 02:04:09PM +0300, Mikhail Ivanov wrote:
> On 11/22/2024 8:45 PM, Günther Noack wrote:
> > Hello Mikhail,
> > 
> > sorry for the delayed response;
> > I am very happy to see activity on this patch set! :)
> 
> Hello Günther,
> No problem, thanks a lot for your feedback!
> 
> > 
> > On Mon, Nov 11, 2024 at 07:29:49PM +0300, Mikhail Ivanov wrote:
> > > On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
> > > > Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
> > > > fine-grained control of actions for a specific protocol. Any action or
> > > > protocol that is not supported by this rule can not be controlled. As a
> > > > result, protocols for which fine-grained control is not supported can be
> > > > used in a sandboxed system and lead to vulnerabilities or unexpected
> > > > behavior.
> > > > 
> > > > Controlling the protocols used will allow to use only those that are
> > > > necessary for the system and/or which have fine-grained Landlock control
> > > > through others types of rules (e.g. TCP bind/connect control with
> > > > `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
> > > > `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
> > > > 
> > > > * Server may want to use only TCP sockets for which there is fine-grained
> > > >     control of bind(2) and connect(2) actions [1].
> > > > * System that does not need a network or that may want to disable network
> > > >     for security reasons (e.g. [2]) can achieve this by restricting the use
> > > >     of all possible protocols.
> > > > 
> > > > This patch implements such control by restricting socket creation in a
> > > > sandboxed process.
> > > > 
> > > > Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
> > > > This rule uses values of address family and socket type (Cf. socket(2))
> > > > to determine sockets that should be restricted. This is represented in a
> > > > landlock_socket_attr struct:
> > > > 
> > > >     struct landlock_socket_attr {
> > > >       __u64 allowed_access;
> > > >       int family; /* same as domain in socket(2) */
> > > >       int type; /* see socket(2) */
> > > >     };
> > > 
> > > Hello! I'd like to consider another approach to define this structure
> > > before sending the next version of this patchset.
> > > 
> > > Currently, it has following possible issues:
> > > 
> > > First of all, there is a lack of protocol granularity. It's impossible
> > > to (for example) deny creation of ICMP and SCTP sockets and allow TCP
> > > and UDP. Since the values of address family and socket type do not
> > > completely define the protocol for the restriction, we may gain
> > > incomplete control of the network actions. AFAICS, this is limited to
> > > only a couple of IP protocol cases (e.g. it's impossible to deny SCTP
> > > and SMC sockets to only allow TCP, deny ICMP and allow UDP).
> > > 
> > > But one of the main advantages of socket access rights is the ability to
> > > allow only those protocols for which there is a fine-grained control
> > > over their actions (TCP bind/connect). It can be inconvenient
> > > (and unsafe) for SCTP to be unrestricted, while sandboxed process only
> > > needs TCP sockets.
> > 
> > That is a good observation which I had missed.
> > 
> > I agree with your analysis, I also see the main use case of socket()
> > restrictions in:
> > 
> >   (a) restricting socket creating altogether
> >   (b) only permitting socket types for which there is fine grained control
> > 
> > and I also agree that it would be very surprising when the same socket types
> > that provide fine grained control would also open the door for unrestricted
> > access to SMC, SCTP or other protocols.  We should instead strive for a
> > socket() access control with which these additional protocols weren't
> > accessible.
> > 
> > 
> > > Adding protocol (Cf. socket(2)) field was considered a bit during the
> > > initial discussion:
> > > https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/
> > 
> > So adding "protocol" to the rule attributes would suffice to restrict the use of
> > SMC and SCTP then?  (Sorry, I lost context on these protocols a bit in the
> > meantime, I was so far under the impression that these were using different
> > values for family and type than TCP and UDP do.)
> 
> Yeap. Following rule will be enough to allow TCP sockets only:
> 
> const struct landlock_socket_attr create_socket_attr = {
> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 	.family = AF_INET{,6},
> 	.type = SOCK_STREAM,
> 	.protocol = 0
> };

We should indeed include the protocol type in the rule definition.

> 
> Btw, creation of SMC sockets via IP stack was added quite recently.
> So far, creation has been possible only with AF_SMC family.
> 
> https://lore.kernel.org/all/1718301630-63692-1-git-send-email-alibuda@linux.alibaba.com/
> 
> > 
> > 
> > > Secondly, I'm not really sure if socket type granularity is required
> > > for most of the protocols. It may be more convenient for the end user
> > > to be able to completely restrict the address family without specifying
> > > whether restriction is dedicated to stream or dgram sockets (e.g. for
> > > BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
> > > current design, since address family can be restricted by specifying
> > > type = SOCK_TYPE_MASK.

It looks like SOCK_TYPE_MASK is not part of UAPI, which means it could
change with kernel versions (even while being in UAPI in fact).  This
new socket creation control should allow to deny any socket creation
known or unknow at the time of the user space program build, and
whatever the available C headers.

This also means that Landlock should accept any domain, type, and
protocols defined in rules.  Indeed, we don't want to reject rules for
which some protocols are not allowed.

What about using bitmasks for the domain and type fields (renamed to
"domains" and "types")?  The last protocol is currently 45/MCTP so a
64-bit field is enough, and 10/SOCK_PACKET also fits for the last socket
type.

We cannot do the same with the protocol because the higher one is
262/MPTCP though.  But it looks like a value of 0 (default protocol)
should be enough for most use cases, and users could specify a protocol
(but this time as a number, not a bitmask).

To sum up, we could have something like this:

  const struct landlock_socket_attr create_socket_attr = {
  	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
  	.families = 1 << AF_INET | 1 << AF_INET6,
  	.types = 1 << SOCK_STREAM,
  	.protocol = IPPROTO_SCTP
  };


> > 
> > Whether the user is adding one rule to permit AF_INET+*, or whether the user is
> > adding two rules to permit (1) AF_INET+SOCK_STREAM and (2) AF_INET+SOCK_DGRAM,
> > that does not seem like a big deal to me as long as the list of such
> > combinations is so low?
> 
> Agreed

I also agree, but this might change if users have to set a combination
of families, types, and protocols.  This should be OK with the bitmask
approach though.

> 
> > 
> > 
> > > I suggest implementing something close to selinux socket classes for the
> > > struct landlock_socket_attr (Cf. socket_type_to_security_class()). This
> > > will provide protocol granularity and may be simpler and more convenient
> > > in the terms of determining access rights. WDYT?
> > 
> > I see that this is a longer switch statement that maps to this enum, it would be
> > an additional data table that would have to be documented separately for users.
> 
> This table is the general drawback, since it makes API a bit more
> complex.
> 
> > 
> > Do you have an example for how such a "security class enum" would map to the
> > combinations of family, type and socket for the protocols discussed above?
> 
> I think the socket_type_to_security_class() has a pretty good mapping
> for UNIX and IP families.

The mapping looks good indeed, and it has been tested for a long time
with many applications.  However, this would make the kernel
implementation more complex, and I think this mapping could easily be
implemented in user space libraries with the bitmask approach, if really
needed, which I'm not sure.

> 
> > 
> > If this is just a matter of actually mapping (family, type, protocol)
> > combinations in a more flexible way, could we get away by allowing a special
> > "wildcard" value for the "protocol" field, when it is used within a ruleset?
> > Then the LSM would have to look up whether there is a rule for (family, type,
> > protocol) and the only change would be that it now needs to also check whether
> > there is a rule for (family, type, *)?
> 
> Something like this?
> 
> const struct landlock_socket_attr create_socket_attr = {
> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 	.family = AF_INET6,
> 	.type = SOCK_DGRAM,
> 	.protocol = LANDLOCK_SOCKET_PROTO_ALL
> };
> 
> > 
> > —Günther
>
Mikhail Ivanov Nov. 28, 2024, 12:01 p.m. UTC | #7
On 11/27/2024 9:43 PM, Mickaël Salaün wrote:
> On Mon, Nov 25, 2024 at 02:04:09PM +0300, Mikhail Ivanov wrote:
>> On 11/22/2024 8:45 PM, Günther Noack wrote:
>>> Hello Mikhail,
>>>
>>> sorry for the delayed response;
>>> I am very happy to see activity on this patch set! :)
>>
>> Hello Günther,
>> No problem, thanks a lot for your feedback!
>>
>>>
>>> On Mon, Nov 11, 2024 at 07:29:49PM +0300, Mikhail Ivanov wrote:
>>>> On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
>>>>> Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
>>>>> fine-grained control of actions for a specific protocol. Any action or
>>>>> protocol that is not supported by this rule can not be controlled. As a
>>>>> result, protocols for which fine-grained control is not supported can be
>>>>> used in a sandboxed system and lead to vulnerabilities or unexpected
>>>>> behavior.
>>>>>
>>>>> Controlling the protocols used will allow to use only those that are
>>>>> necessary for the system and/or which have fine-grained Landlock control
>>>>> through others types of rules (e.g. TCP bind/connect control with
>>>>> `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
>>>>> `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
>>>>>
>>>>> * Server may want to use only TCP sockets for which there is fine-grained
>>>>>      control of bind(2) and connect(2) actions [1].
>>>>> * System that does not need a network or that may want to disable network
>>>>>      for security reasons (e.g. [2]) can achieve this by restricting the use
>>>>>      of all possible protocols.
>>>>>
>>>>> This patch implements such control by restricting socket creation in a
>>>>> sandboxed process.
>>>>>
>>>>> Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
>>>>> This rule uses values of address family and socket type (Cf. socket(2))
>>>>> to determine sockets that should be restricted. This is represented in a
>>>>> landlock_socket_attr struct:
>>>>>
>>>>>      struct landlock_socket_attr {
>>>>>        __u64 allowed_access;
>>>>>        int family; /* same as domain in socket(2) */
>>>>>        int type; /* see socket(2) */
>>>>>      };
>>>>
>>>> Hello! I'd like to consider another approach to define this structure
>>>> before sending the next version of this patchset.
>>>>
>>>> Currently, it has following possible issues:
>>>>
>>>> First of all, there is a lack of protocol granularity. It's impossible
>>>> to (for example) deny creation of ICMP and SCTP sockets and allow TCP
>>>> and UDP. Since the values of address family and socket type do not
>>>> completely define the protocol for the restriction, we may gain
>>>> incomplete control of the network actions. AFAICS, this is limited to
>>>> only a couple of IP protocol cases (e.g. it's impossible to deny SCTP
>>>> and SMC sockets to only allow TCP, deny ICMP and allow UDP).
>>>>
>>>> But one of the main advantages of socket access rights is the ability to
>>>> allow only those protocols for which there is a fine-grained control
>>>> over their actions (TCP bind/connect). It can be inconvenient
>>>> (and unsafe) for SCTP to be unrestricted, while sandboxed process only
>>>> needs TCP sockets.
>>>
>>> That is a good observation which I had missed.
>>>
>>> I agree with your analysis, I also see the main use case of socket()
>>> restrictions in:
>>>
>>>    (a) restricting socket creating altogether
>>>    (b) only permitting socket types for which there is fine grained control
>>>
>>> and I also agree that it would be very surprising when the same socket types
>>> that provide fine grained control would also open the door for unrestricted
>>> access to SMC, SCTP or other protocols.  We should instead strive for a
>>> socket() access control with which these additional protocols weren't
>>> accessible.
>>>
>>>
>>>> Adding protocol (Cf. socket(2)) field was considered a bit during the
>>>> initial discussion:
>>>> https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/
>>>
>>> So adding "protocol" to the rule attributes would suffice to restrict the use of
>>> SMC and SCTP then?  (Sorry, I lost context on these protocols a bit in the
>>> meantime, I was so far under the impression that these were using different
>>> values for family and type than TCP and UDP do.)
>>
>> Yeap. Following rule will be enough to allow TCP sockets only:
>>
>> const struct landlock_socket_attr create_socket_attr = {
>> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>> 	.family = AF_INET{,6},
>> 	.type = SOCK_STREAM,
>> 	.protocol = 0
>> };
> 
> We should indeed include the protocol type in the rule definition.
> 
>>
>> Btw, creation of SMC sockets via IP stack was added quite recently.
>> So far, creation has been possible only with AF_SMC family.
>>
>> https://lore.kernel.org/all/1718301630-63692-1-git-send-email-alibuda@linux.alibaba.com/
>>
>>>
>>>
>>>> Secondly, I'm not really sure if socket type granularity is required
>>>> for most of the protocols. It may be more convenient for the end user
>>>> to be able to completely restrict the address family without specifying
>>>> whether restriction is dedicated to stream or dgram sockets (e.g. for
>>>> BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
>>>> current design, since address family can be restricted by specifying
>>>> type = SOCK_TYPE_MASK.
> 
> It looks like SOCK_TYPE_MASK is not part of UAPI, which means it could
> change with kernel versions (even while being in UAPI in fact).  This
> new socket creation control should allow to deny any socket creation
> known or unknow at the time of the user space program build, and
> whatever the available C headers.

Agreed

> 
> This also means that Landlock should accept any domain, type, and
> protocols defined in rules.  Indeed, we don't want to reject rules for
> which some protocols are not allowed.

Do you mean that Landlock should not make any assumptions about this
values during a build time? Currently, patchset provides boundary checks
for domain (< AF_MAX) and type (< SOCK_MAX) in landlock_add_rule().

> 
> What about using bitmasks for the domain and type fields (renamed to
> "domains" and "types")?  The last protocol is currently 45/MCTP so a
> 64-bit field is enough, and 10/SOCK_PACKET also fits for the last socket
> type.
> 
> We cannot do the same with the protocol because the higher one is
> 262/MPTCP though.  But it looks like a value of 0 (default protocol)
> should be enough for most use cases, and users could specify a protocol
> (but this time as a number, not a bitmask).
> 
> To sum up, we could have something like this:
> 
>    const struct landlock_socket_attr create_socket_attr = {
>    	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>    	.families = 1 << AF_INET | 1 << AF_INET6,
>    	.types = 1 << SOCK_STREAM,
>    	.protocol = IPPROTO_SCTP
>    };

Looks good! I think it's a nice approach which will provide a sufficient
level of flexibility to define a single rule for a specific protocol (or
for related protocols).

But, this adds possibility to define a single rule for the set of
unrelated protocols:

/* Allows TCP, UDP and UNIX sockets. */
const struct landlock_socket_attr create_socket_attr = {
	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
	.families = 1 << AF_INET | 1 << AF_INET6 | 1 << AF_UNIX,
	.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
	.protocol = 0
};

Perhaps limiting the addition of one rule to only one address family
would be more clear in terms of rule semantics?:

/* Allows TCP, UDP, UNIX STREAM, UNIX DGRAM sockets. */
const struct landlock_socket_attr create_socket_attrs[] = {
	{
		/* Allows IPv4 TCP and UDP sockets. */
		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
		.family = AF_INET,
		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
		.protocol = 0
	},
	{
		/* Allows IPv6 TCP and UDP sockets. */
		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
		.family = AF_INET6,
		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
		.protocol = 0
	},
	{
		/* Allows UNIX sockets. */
		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
		.family = AF_UNIX,
		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
		.protocol = 0
	},
};

> 
> 
>>>
>>> Whether the user is adding one rule to permit AF_INET+*, or whether the user is
>>> adding two rules to permit (1) AF_INET+SOCK_STREAM and (2) AF_INET+SOCK_DGRAM,
>>> that does not seem like a big deal to me as long as the list of such
>>> combinations is so low?
>>
>> Agreed
> 
> I also agree, but this might change if users have to set a combination
> of families, types, and protocols.  This should be OK with the bitmask
> approach though.
> 
>>
>>>
>>>
>>>> I suggest implementing something close to selinux socket classes for the
>>>> struct landlock_socket_attr (Cf. socket_type_to_security_class()). This
>>>> will provide protocol granularity and may be simpler and more convenient
>>>> in the terms of determining access rights. WDYT?
>>>
>>> I see that this is a longer switch statement that maps to this enum, it would be
>>> an additional data table that would have to be documented separately for users.
>>
>> This table is the general drawback, since it makes API a bit more
>> complex.
>>
>>>
>>> Do you have an example for how such a "security class enum" would map to the
>>> combinations of family, type and socket for the protocols discussed above?
>>
>> I think the socket_type_to_security_class() has a pretty good mapping
>> for UNIX and IP families.
> 
> The mapping looks good indeed, and it has been tested for a long time
> with many applications.  However, this would make the kernel
> implementation more complex, and I think this mapping could easily be
> implemented in user space libraries with the bitmask approach, if really
> needed, which I'm not sure.

I agree, implementing this in a library is a better approach. Thanks for
the catch!

> 
>>
>>>
>>> If this is just a matter of actually mapping (family, type, protocol)
>>> combinations in a more flexible way, could we get away by allowing a special
>>> "wildcard" value for the "protocol" field, when it is used within a ruleset?
>>> Then the LSM would have to look up whether there is a rule for (family, type,
>>> protocol) and the only change would be that it now needs to also check whether
>>> there is a rule for (family, type, *)?
>>
>> Something like this?
>>
>> const struct landlock_socket_attr create_socket_attr = {
>> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>> 	.family = AF_INET6,
>> 	.type = SOCK_DGRAM,
>> 	.protocol = LANDLOCK_SOCKET_PROTO_ALL
>> };
>>
>>>
>>> —Günther
>>
Mickaël Salaün Nov. 28, 2024, 8:52 p.m. UTC | #8
On Thu, Nov 28, 2024 at 03:01:52PM +0300, Mikhail Ivanov wrote:
> On 11/27/2024 9:43 PM, Mickaël Salaün wrote:
> > On Mon, Nov 25, 2024 at 02:04:09PM +0300, Mikhail Ivanov wrote:
> > > On 11/22/2024 8:45 PM, Günther Noack wrote:
> > > > Hello Mikhail,
> > > > 
> > > > sorry for the delayed response;
> > > > I am very happy to see activity on this patch set! :)
> > > 
> > > Hello Günther,
> > > No problem, thanks a lot for your feedback!
> > > 
> > > > 
> > > > On Mon, Nov 11, 2024 at 07:29:49PM +0300, Mikhail Ivanov wrote:
> > > > > On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
> > > > > > Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
> > > > > > fine-grained control of actions for a specific protocol. Any action or
> > > > > > protocol that is not supported by this rule can not be controlled. As a
> > > > > > result, protocols for which fine-grained control is not supported can be
> > > > > > used in a sandboxed system and lead to vulnerabilities or unexpected
> > > > > > behavior.
> > > > > > 
> > > > > > Controlling the protocols used will allow to use only those that are
> > > > > > necessary for the system and/or which have fine-grained Landlock control
> > > > > > through others types of rules (e.g. TCP bind/connect control with
> > > > > > `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
> > > > > > `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
> > > > > > 
> > > > > > * Server may want to use only TCP sockets for which there is fine-grained
> > > > > >      control of bind(2) and connect(2) actions [1].
> > > > > > * System that does not need a network or that may want to disable network
> > > > > >      for security reasons (e.g. [2]) can achieve this by restricting the use
> > > > > >      of all possible protocols.
> > > > > > 
> > > > > > This patch implements such control by restricting socket creation in a
> > > > > > sandboxed process.
> > > > > > 
> > > > > > Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
> > > > > > This rule uses values of address family and socket type (Cf. socket(2))
> > > > > > to determine sockets that should be restricted. This is represented in a
> > > > > > landlock_socket_attr struct:
> > > > > > 
> > > > > >      struct landlock_socket_attr {
> > > > > >        __u64 allowed_access;
> > > > > >        int family; /* same as domain in socket(2) */
> > > > > >        int type; /* see socket(2) */
> > > > > >      };
> > > > > 
> > > > > Hello! I'd like to consider another approach to define this structure
> > > > > before sending the next version of this patchset.
> > > > > 
> > > > > Currently, it has following possible issues:
> > > > > 
> > > > > First of all, there is a lack of protocol granularity. It's impossible
> > > > > to (for example) deny creation of ICMP and SCTP sockets and allow TCP
> > > > > and UDP. Since the values of address family and socket type do not
> > > > > completely define the protocol for the restriction, we may gain
> > > > > incomplete control of the network actions. AFAICS, this is limited to
> > > > > only a couple of IP protocol cases (e.g. it's impossible to deny SCTP
> > > > > and SMC sockets to only allow TCP, deny ICMP and allow UDP).
> > > > > 
> > > > > But one of the main advantages of socket access rights is the ability to
> > > > > allow only those protocols for which there is a fine-grained control
> > > > > over their actions (TCP bind/connect). It can be inconvenient
> > > > > (and unsafe) for SCTP to be unrestricted, while sandboxed process only
> > > > > needs TCP sockets.
> > > > 
> > > > That is a good observation which I had missed.
> > > > 
> > > > I agree with your analysis, I also see the main use case of socket()
> > > > restrictions in:
> > > > 
> > > >    (a) restricting socket creating altogether
> > > >    (b) only permitting socket types for which there is fine grained control
> > > > 
> > > > and I also agree that it would be very surprising when the same socket types
> > > > that provide fine grained control would also open the door for unrestricted
> > > > access to SMC, SCTP or other protocols.  We should instead strive for a
> > > > socket() access control with which these additional protocols weren't
> > > > accessible.
> > > > 
> > > > 
> > > > > Adding protocol (Cf. socket(2)) field was considered a bit during the
> > > > > initial discussion:
> > > > > https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/
> > > > 
> > > > So adding "protocol" to the rule attributes would suffice to restrict the use of
> > > > SMC and SCTP then?  (Sorry, I lost context on these protocols a bit in the
> > > > meantime, I was so far under the impression that these were using different
> > > > values for family and type than TCP and UDP do.)
> > > 
> > > Yeap. Following rule will be enough to allow TCP sockets only:
> > > 
> > > const struct landlock_socket_attr create_socket_attr = {
> > > 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > > 	.family = AF_INET{,6},
> > > 	.type = SOCK_STREAM,
> > > 	.protocol = 0
> > > };
> > 
> > We should indeed include the protocol type in the rule definition.
> > 
> > > 
> > > Btw, creation of SMC sockets via IP stack was added quite recently.
> > > So far, creation has been possible only with AF_SMC family.
> > > 
> > > https://lore.kernel.org/all/1718301630-63692-1-git-send-email-alibuda@linux.alibaba.com/
> > > 
> > > > 
> > > > 
> > > > > Secondly, I'm not really sure if socket type granularity is required
> > > > > for most of the protocols. It may be more convenient for the end user
> > > > > to be able to completely restrict the address family without specifying
> > > > > whether restriction is dedicated to stream or dgram sockets (e.g. for
> > > > > BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
> > > > > current design, since address family can be restricted by specifying
> > > > > type = SOCK_TYPE_MASK.
> > 
> > It looks like SOCK_TYPE_MASK is not part of UAPI, which means it could
> > change with kernel versions (even while being in UAPI in fact).  This
> > new socket creation control should allow to deny any socket creation
> > known or unknow at the time of the user space program build, and
> > whatever the available C headers.
> 
> Agreed
> 
> > 
> > This also means that Landlock should accept any domain, type, and
> > protocols defined in rules.  Indeed, we don't want to reject rules for
> > which some protocols are not allowed.
> 
> Do you mean that Landlock should not make any assumptions about this
> values during a build time? Currently, patchset provides boundary checks
> for domain (< AF_MAX) and type (< SOCK_MAX) in landlock_add_rule().

The *running kernel* may not support some socket's domains or types,
which may be confusing for users if the rule was tested on a kernel
supporting such domains/types.

For the bitmask of domains or types, the issues to keep boundary checks
would be when a subset of them is not supported.  Landlock would reject
such rule and it would be difficult for users to identify the cause.

I'm still wondering if the landlock_append_net_rule()'s -EAFNOSUPPORT
return value for kernels without CONFIG_INET was a good idea.  We should
probably return 0 in this case, which would be similar to not checking
socket's domains nor types.

> 
> > 
> > What about using bitmasks for the domain and type fields (renamed to
> > "domains" and "types")?  The last protocol is currently 45/MCTP so a
> > 64-bit field is enough, and 10/SOCK_PACKET also fits for the last socket
> > type.
> > 
> > We cannot do the same with the protocol because the higher one is
> > 262/MPTCP though.  But it looks like a value of 0 (default protocol)
> > should be enough for most use cases, and users could specify a protocol
> > (but this time as a number, not a bitmask).
> > 
> > To sum up, we could have something like this:
> > 
> >    const struct landlock_socket_attr create_socket_attr = {
> >    	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> >    	.families = 1 << AF_INET | 1 << AF_INET6,
> >    	.types = 1 << SOCK_STREAM,
> >    	.protocol = IPPROTO_SCTP
> >    };
> 
> Looks good! I think it's a nice approach which will provide a sufficient
> level of flexibility to define a single rule for a specific protocol (or
> for related protocols).
> 
> But, this adds possibility to define a single rule for the set of
> unrelated protocols:
> 
> /* Allows TCP, UDP and UNIX sockets. */
> const struct landlock_socket_attr create_socket_attr = {
> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 	.families = 1 << AF_INET | 1 << AF_INET6 | 1 << AF_UNIX,
> 	.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
> 	.protocol = 0
> };
> 
> Perhaps limiting the addition of one rule to only one address family
> would be more clear in terms of rule semantics?:
> 
> /* Allows TCP, UDP, UNIX STREAM, UNIX DGRAM sockets. */
> const struct landlock_socket_attr create_socket_attrs[] = {
> 	{
> 		/* Allows IPv4 TCP and UDP sockets. */
> 		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 		.family = AF_INET,
> 		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
> 		.protocol = 0
> 	},
> 	{
> 		/* Allows IPv6 TCP and UDP sockets. */
> 		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 		.family = AF_INET6,
> 		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
> 		.protocol = 0
> 	},
> 	{
> 		/* Allows UNIX sockets. */
> 		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 		.family = AF_UNIX,
> 		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
> 		.protocol = 0
> 	},
> };

Because we are already mixing bitmasks and (protocol) value, I'm not
sure it will help much.  I think in most cases the "families" bitmask
would handle IPv4 and IPv6 the same (e.g. to only allow TCP with one
rule).  I think this is also required to be able to have a 1:1 mapping
with SELinux's socket_type_to_security_class().

> 
> > 
> > 
> > > > 
> > > > Whether the user is adding one rule to permit AF_INET+*, or whether the user is
> > > > adding two rules to permit (1) AF_INET+SOCK_STREAM and (2) AF_INET+SOCK_DGRAM,
> > > > that does not seem like a big deal to me as long as the list of such
> > > > combinations is so low?
> > > 
> > > Agreed
> > 
> > I also agree, but this might change if users have to set a combination
> > of families, types, and protocols.  This should be OK with the bitmask
> > approach though.
> > 
> > > 
> > > > 
> > > > 
> > > > > I suggest implementing something close to selinux socket classes for the
> > > > > struct landlock_socket_attr (Cf. socket_type_to_security_class()). This
> > > > > will provide protocol granularity and may be simpler and more convenient
> > > > > in the terms of determining access rights. WDYT?
> > > > 
> > > > I see that this is a longer switch statement that maps to this enum, it would be
> > > > an additional data table that would have to be documented separately for users.
> > > 
> > > This table is the general drawback, since it makes API a bit more
> > > complex.
> > > 
> > > > 
> > > > Do you have an example for how such a "security class enum" would map to the
> > > > combinations of family, type and socket for the protocols discussed above?
> > > 
> > > I think the socket_type_to_security_class() has a pretty good mapping
> > > for UNIX and IP families.
> > 
> > The mapping looks good indeed, and it has been tested for a long time
> > with many applications.  However, this would make the kernel
> > implementation more complex, and I think this mapping could easily be
> > implemented in user space libraries with the bitmask approach, if really
> > needed, which I'm not sure.
> 
> I agree, implementing this in a library is a better approach. Thanks for
> the catch!
> 
> > 
> > > 
> > > > 
> > > > If this is just a matter of actually mapping (family, type, protocol)
> > > > combinations in a more flexible way, could we get away by allowing a special
> > > > "wildcard" value for the "protocol" field, when it is used within a ruleset?
> > > > Then the LSM would have to look up whether there is a rule for (family, type,
> > > > protocol) and the only change would be that it now needs to also check whether
> > > > there is a rule for (family, type, *)?
> > > 
> > > Something like this?
> > > 
> > > const struct landlock_socket_attr create_socket_attr = {
> > > 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > > 	.family = AF_INET6,
> > > 	.type = SOCK_DGRAM,
> > > 	.protocol = LANDLOCK_SOCKET_PROTO_ALL
> > > };
> > > 
> > > > 
> > > > —Günther
> > > 
>
Mikhail Ivanov Dec. 2, 2024, 11:32 a.m. UTC | #9
On 11/28/2024 11:52 PM, Mickaël Salaün wrote:
> On Thu, Nov 28, 2024 at 03:01:52PM +0300, Mikhail Ivanov wrote:
>> On 11/27/2024 9:43 PM, Mickaël Salaün wrote:
>>> On Mon, Nov 25, 2024 at 02:04:09PM +0300, Mikhail Ivanov wrote:
>>>> On 11/22/2024 8:45 PM, Günther Noack wrote:
>>>>> Hello Mikhail,
>>>>>
>>>>> sorry for the delayed response;
>>>>> I am very happy to see activity on this patch set! :)
>>>>
>>>> Hello Günther,
>>>> No problem, thanks a lot for your feedback!
>>>>
>>>>>
>>>>> On Mon, Nov 11, 2024 at 07:29:49PM +0300, Mikhail Ivanov wrote:
>>>>>> On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
>>>>>>> Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, which provides
>>>>>>> fine-grained control of actions for a specific protocol. Any action or
>>>>>>> protocol that is not supported by this rule can not be controlled. As a
>>>>>>> result, protocols for which fine-grained control is not supported can be
>>>>>>> used in a sandboxed system and lead to vulnerabilities or unexpected
>>>>>>> behavior.
>>>>>>>
>>>>>>> Controlling the protocols used will allow to use only those that are
>>>>>>> necessary for the system and/or which have fine-grained Landlock control
>>>>>>> through others types of rules (e.g. TCP bind/connect control with
>>>>>>> `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
>>>>>>> `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
>>>>>>>
>>>>>>> * Server may want to use only TCP sockets for which there is fine-grained
>>>>>>>       control of bind(2) and connect(2) actions [1].
>>>>>>> * System that does not need a network or that may want to disable network
>>>>>>>       for security reasons (e.g. [2]) can achieve this by restricting the use
>>>>>>>       of all possible protocols.
>>>>>>>
>>>>>>> This patch implements such control by restricting socket creation in a
>>>>>>> sandboxed process.
>>>>>>>
>>>>>>> Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on sockets.
>>>>>>> This rule uses values of address family and socket type (Cf. socket(2))
>>>>>>> to determine sockets that should be restricted. This is represented in a
>>>>>>> landlock_socket_attr struct:
>>>>>>>
>>>>>>>       struct landlock_socket_attr {
>>>>>>>         __u64 allowed_access;
>>>>>>>         int family; /* same as domain in socket(2) */
>>>>>>>         int type; /* see socket(2) */
>>>>>>>       };
>>>>>>
>>>>>> Hello! I'd like to consider another approach to define this structure
>>>>>> before sending the next version of this patchset.
>>>>>>
>>>>>> Currently, it has following possible issues:
>>>>>>
>>>>>> First of all, there is a lack of protocol granularity. It's impossible
>>>>>> to (for example) deny creation of ICMP and SCTP sockets and allow TCP
>>>>>> and UDP. Since the values of address family and socket type do not
>>>>>> completely define the protocol for the restriction, we may gain
>>>>>> incomplete control of the network actions. AFAICS, this is limited to
>>>>>> only a couple of IP protocol cases (e.g. it's impossible to deny SCTP
>>>>>> and SMC sockets to only allow TCP, deny ICMP and allow UDP).
>>>>>>
>>>>>> But one of the main advantages of socket access rights is the ability to
>>>>>> allow only those protocols for which there is a fine-grained control
>>>>>> over their actions (TCP bind/connect). It can be inconvenient
>>>>>> (and unsafe) for SCTP to be unrestricted, while sandboxed process only
>>>>>> needs TCP sockets.
>>>>>
>>>>> That is a good observation which I had missed.
>>>>>
>>>>> I agree with your analysis, I also see the main use case of socket()
>>>>> restrictions in:
>>>>>
>>>>>     (a) restricting socket creating altogether
>>>>>     (b) only permitting socket types for which there is fine grained control
>>>>>
>>>>> and I also agree that it would be very surprising when the same socket types
>>>>> that provide fine grained control would also open the door for unrestricted
>>>>> access to SMC, SCTP or other protocols.  We should instead strive for a
>>>>> socket() access control with which these additional protocols weren't
>>>>> accessible.
>>>>>
>>>>>
>>>>>> Adding protocol (Cf. socket(2)) field was considered a bit during the
>>>>>> initial discussion:
>>>>>> https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/
>>>>>
>>>>> So adding "protocol" to the rule attributes would suffice to restrict the use of
>>>>> SMC and SCTP then?  (Sorry, I lost context on these protocols a bit in the
>>>>> meantime, I was so far under the impression that these were using different
>>>>> values for family and type than TCP and UDP do.)
>>>>
>>>> Yeap. Following rule will be enough to allow TCP sockets only:
>>>>
>>>> const struct landlock_socket_attr create_socket_attr = {
>>>> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>> 	.family = AF_INET{,6},
>>>> 	.type = SOCK_STREAM,
>>>> 	.protocol = 0
>>>> };
>>>
>>> We should indeed include the protocol type in the rule definition.
>>>
>>>>
>>>> Btw, creation of SMC sockets via IP stack was added quite recently.
>>>> So far, creation has been possible only with AF_SMC family.
>>>>
>>>> https://lore.kernel.org/all/1718301630-63692-1-git-send-email-alibuda@linux.alibaba.com/
>>>>
>>>>>
>>>>>
>>>>>> Secondly, I'm not really sure if socket type granularity is required
>>>>>> for most of the protocols. It may be more convenient for the end user
>>>>>> to be able to completely restrict the address family without specifying
>>>>>> whether restriction is dedicated to stream or dgram sockets (e.g. for
>>>>>> BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
>>>>>> current design, since address family can be restricted by specifying
>>>>>> type = SOCK_TYPE_MASK.
>>>
>>> It looks like SOCK_TYPE_MASK is not part of UAPI, which means it could
>>> change with kernel versions (even while being in UAPI in fact).  This
>>> new socket creation control should allow to deny any socket creation
>>> known or unknow at the time of the user space program build, and
>>> whatever the available C headers.
>>
>> Agreed
>>
>>>
>>> This also means that Landlock should accept any domain, type, and
>>> protocols defined in rules.  Indeed, we don't want to reject rules for
>>> which some protocols are not allowed.
>>
>> Do you mean that Landlock should not make any assumptions about this
>> values during a build time? Currently, patchset provides boundary checks
>> for domain (< AF_MAX) and type (< SOCK_MAX) in landlock_add_rule().
> 
> The *running kernel* may not support some socket's domains or types,
> which may be confusing for users if the rule was tested on a kernel
> supporting such domains/types. >
> For the bitmask of domains or types, the issues to keep boundary checks
> would be when a subset of them is not supported.  Landlock would reject
> such rule and it would be difficult for users to identify the cause.

Ok, I'll remove these checks.

> 
> I'm still wondering if the landlock_append_net_rule()'s -EAFNOSUPPORT
> return value for kernels without CONFIG_INET was a good idea.  We should
> probably return 0 in this case, which would be similar to not checking
> socket's domains nor types.

It seems that returning -EAFNOSUPPORT only complicates error checking
for landlock_append_net_rule() from the user's perspective. Probably the
only reason to check the correctness of restricted objects in Landlock
is to provide errors consistency in hooks.

> 
>>
>>>
>>> What about using bitmasks for the domain and type fields (renamed to
>>> "domains" and "types")?  The last protocol is currently 45/MCTP so a
>>> 64-bit field is enough, and 10/SOCK_PACKET also fits for the last socket
>>> type.
>>>
>>> We cannot do the same with the protocol because the higher one is
>>> 262/MPTCP though.  But it looks like a value of 0 (default protocol)
>>> should be enough for most use cases, and users could specify a protocol
>>> (but this time as a number, not a bitmask).
>>>
>>> To sum up, we could have something like this:
>>>
>>>     const struct landlock_socket_attr create_socket_attr = {
>>>     	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>     	.families = 1 << AF_INET | 1 << AF_INET6,
>>>     	.types = 1 << SOCK_STREAM,
>>>     	.protocol = IPPROTO_SCTP
>>>     };
>>
>> Looks good! I think it's a nice approach which will provide a sufficient
>> level of flexibility to define a single rule for a specific protocol (or
>> for related protocols).
>>
>> But, this adds possibility to define a single rule for the set of
>> unrelated protocols:
>>
>> /* Allows TCP, UDP and UNIX sockets. */
>> const struct landlock_socket_attr create_socket_attr = {
>> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>> 	.families = 1 << AF_INET | 1 << AF_INET6 | 1 << AF_UNIX,
>> 	.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>> 	.protocol = 0
>> };
>>
>> Perhaps limiting the addition of one rule to only one address family
>> would be more clear in terms of rule semantics?:
>>
>> /* Allows TCP, UDP, UNIX STREAM, UNIX DGRAM sockets. */
>> const struct landlock_socket_attr create_socket_attrs[] = {
>> 	{
>> 		/* Allows IPv4 TCP and UDP sockets. */
>> 		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>> 		.family = AF_INET,
>> 		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>> 		.protocol = 0
>> 	},
>> 	{
>> 		/* Allows IPv6 TCP and UDP sockets. */
>> 		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>> 		.family = AF_INET6,
>> 		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>> 		.protocol = 0
>> 	},
>> 	{
>> 		/* Allows UNIX sockets. */
>> 		.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>> 		.family = AF_UNIX,
>> 		.types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>> 		.protocol = 0
>> 	},
>> };
> 
> Because we are already mixing bitmasks and (protocol) value, I'm not
> sure it will help much.  I think in most cases the "families" bitmask
> would handle IPv4 and IPv6 the same (e.g. to only allow TCP with one
> rule).  I think this is also required to be able to have a 1:1 mapping
> with SELinux's socket_type_to_security_class().

Ok, agreed

> 
>>
>>>
>>>
>>>>>
>>>>> Whether the user is adding one rule to permit AF_INET+*, or whether the user is
>>>>> adding two rules to permit (1) AF_INET+SOCK_STREAM and (2) AF_INET+SOCK_DGRAM,
>>>>> that does not seem like a big deal to me as long as the list of such
>>>>> combinations is so low?
>>>>
>>>> Agreed
>>>
>>> I also agree, but this might change if users have to set a combination
>>> of families, types, and protocols.  This should be OK with the bitmask
>>> approach though.
>>>
>>>>
>>>>>
>>>>>
>>>>>> I suggest implementing something close to selinux socket classes for the
>>>>>> struct landlock_socket_attr (Cf. socket_type_to_security_class()). This
>>>>>> will provide protocol granularity and may be simpler and more convenient
>>>>>> in the terms of determining access rights. WDYT?
>>>>>
>>>>> I see that this is a longer switch statement that maps to this enum, it would be
>>>>> an additional data table that would have to be documented separately for users.
>>>>
>>>> This table is the general drawback, since it makes API a bit more
>>>> complex.
>>>>
>>>>>
>>>>> Do you have an example for how such a "security class enum" would map to the
>>>>> combinations of family, type and socket for the protocols discussed above?
>>>>
>>>> I think the socket_type_to_security_class() has a pretty good mapping
>>>> for UNIX and IP families.
>>>
>>> The mapping looks good indeed, and it has been tested for a long time
>>> with many applications.  However, this would make the kernel
>>> implementation more complex, and I think this mapping could easily be
>>> implemented in user space libraries with the bitmask approach, if really
>>> needed, which I'm not sure.
>>
>> I agree, implementing this in a library is a better approach. Thanks for
>> the catch!
>>
>>>
>>>>
>>>>>
>>>>> If this is just a matter of actually mapping (family, type, protocol)
>>>>> combinations in a more flexible way, could we get away by allowing a special
>>>>> "wildcard" value for the "protocol" field, when it is used within a ruleset?
>>>>> Then the LSM would have to look up whether there is a rule for (family, type,
>>>>> protocol) and the only change would be that it now needs to also check whether
>>>>> there is a rule for (family, type, *)?
>>>>
>>>> Something like this?
>>>>
>>>> const struct landlock_socket_attr create_socket_attr = {
>>>> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>> 	.family = AF_INET6,
>>>> 	.type = SOCK_DGRAM,
>>>> 	.protocol = LANDLOCK_SOCKET_PROTO_ALL
>>>> };
>>>>
>>>>>
>>>>> —Günther
>>>>
>>
Mikhail Ivanov Dec. 24, 2024, 4:55 p.m. UTC | #10
On 12/2/2024 2:32 PM, Mikhail Ivanov wrote:
> On 11/28/2024 11:52 PM, Mickaël Salaün wrote:
>> On Thu, Nov 28, 2024 at 03:01:52PM +0300, Mikhail Ivanov wrote:
>>> On 11/27/2024 9:43 PM, Mickaël Salaün wrote:
>>>> On Mon, Nov 25, 2024 at 02:04:09PM +0300, Mikhail Ivanov wrote:
>>>>> On 11/22/2024 8:45 PM, Günther Noack wrote:
>>>>>> Hello Mikhail,
>>>>>>
>>>>>> sorry for the delayed response;
>>>>>> I am very happy to see activity on this patch set! :)
>>>>>
>>>>> Hello Günther,
>>>>> No problem, thanks a lot for your feedback!
>>>>>
>>>>>>
>>>>>> On Mon, Nov 11, 2024 at 07:29:49PM +0300, Mikhail Ivanov wrote:
>>>>>>> On 9/4/2024 1:48 PM, Mikhail Ivanov wrote:
>>>>>>>> Landlock implements the `LANDLOCK_RULE_NET_PORT` rule type, 
>>>>>>>> which provides
>>>>>>>> fine-grained control of actions for a specific protocol. Any 
>>>>>>>> action or
>>>>>>>> protocol that is not supported by this rule can not be 
>>>>>>>> controlled. As a
>>>>>>>> result, protocols for which fine-grained control is not 
>>>>>>>> supported can be
>>>>>>>> used in a sandboxed system and lead to vulnerabilities or 
>>>>>>>> unexpected
>>>>>>>> behavior.
>>>>>>>>
>>>>>>>> Controlling the protocols used will allow to use only those that 
>>>>>>>> are
>>>>>>>> necessary for the system and/or which have fine-grained Landlock 
>>>>>>>> control
>>>>>>>> through others types of rules (e.g. TCP bind/connect control with
>>>>>>>> `LANDLOCK_RULE_NET_PORT`, UNIX bind control with
>>>>>>>> `LANDLOCK_RULE_PATH_BENEATH`). Consider following examples:
>>>>>>>>
>>>>>>>> * Server may want to use only TCP sockets for which there is 
>>>>>>>> fine-grained
>>>>>>>>       control of bind(2) and connect(2) actions [1].
>>>>>>>> * System that does not need a network or that may want to 
>>>>>>>> disable network
>>>>>>>>       for security reasons (e.g. [2]) can achieve this by 
>>>>>>>> restricting the use
>>>>>>>>       of all possible protocols.
>>>>>>>>
>>>>>>>> This patch implements such control by restricting socket 
>>>>>>>> creation in a
>>>>>>>> sandboxed process.
>>>>>>>>
>>>>>>>> Add `LANDLOCK_RULE_SOCKET` rule type that restricts actions on 
>>>>>>>> sockets.
>>>>>>>> This rule uses values of address family and socket type (Cf. 
>>>>>>>> socket(2))
>>>>>>>> to determine sockets that should be restricted. This is 
>>>>>>>> represented in a
>>>>>>>> landlock_socket_attr struct:
>>>>>>>>
>>>>>>>>       struct landlock_socket_attr {
>>>>>>>>         __u64 allowed_access;
>>>>>>>>         int family; /* same as domain in socket(2) */
>>>>>>>>         int type; /* see socket(2) */
>>>>>>>>       };
>>>>>>>
>>>>>>> Hello! I'd like to consider another approach to define this 
>>>>>>> structure
>>>>>>> before sending the next version of this patchset.
>>>>>>>
>>>>>>> Currently, it has following possible issues:
>>>>>>>
>>>>>>> First of all, there is a lack of protocol granularity. It's 
>>>>>>> impossible
>>>>>>> to (for example) deny creation of ICMP and SCTP sockets and allow 
>>>>>>> TCP
>>>>>>> and UDP. Since the values of address family and socket type do not
>>>>>>> completely define the protocol for the restriction, we may gain
>>>>>>> incomplete control of the network actions. AFAICS, this is 
>>>>>>> limited to
>>>>>>> only a couple of IP protocol cases (e.g. it's impossible to deny 
>>>>>>> SCTP
>>>>>>> and SMC sockets to only allow TCP, deny ICMP and allow UDP).
>>>>>>>
>>>>>>> But one of the main advantages of socket access rights is the 
>>>>>>> ability to
>>>>>>> allow only those protocols for which there is a fine-grained control
>>>>>>> over their actions (TCP bind/connect). It can be inconvenient
>>>>>>> (and unsafe) for SCTP to be unrestricted, while sandboxed process 
>>>>>>> only
>>>>>>> needs TCP sockets.
>>>>>>
>>>>>> That is a good observation which I had missed.
>>>>>>
>>>>>> I agree with your analysis, I also see the main use case of socket()
>>>>>> restrictions in:
>>>>>>
>>>>>>     (a) restricting socket creating altogether
>>>>>>     (b) only permitting socket types for which there is fine 
>>>>>> grained control
>>>>>>
>>>>>> and I also agree that it would be very surprising when the same 
>>>>>> socket types
>>>>>> that provide fine grained control would also open the door for 
>>>>>> unrestricted
>>>>>> access to SMC, SCTP or other protocols.  We should instead strive 
>>>>>> for a
>>>>>> socket() access control with which these additional protocols weren't
>>>>>> accessible.
>>>>>>
>>>>>>
>>>>>>> Adding protocol (Cf. socket(2)) field was considered a bit during 
>>>>>>> the
>>>>>>> initial discussion:
>>>>>>> https://lore.kernel.org/all/CABi2SkVWU=Wxb2y3fP702twyHBD3kVoySPGSz2X22VckvcHeXw@mail.gmail.com/
>>>>>>
>>>>>> So adding "protocol" to the rule attributes would suffice to 
>>>>>> restrict the use of
>>>>>> SMC and SCTP then?  (Sorry, I lost context on these protocols a 
>>>>>> bit in the
>>>>>> meantime, I was so far under the impression that these were using 
>>>>>> different
>>>>>> values for family and type than TCP and UDP do.)
>>>>>
>>>>> Yeap. Following rule will be enough to allow TCP sockets only:
>>>>>
>>>>> const struct landlock_socket_attr create_socket_attr = {
>>>>>     .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>>>     .family = AF_INET{,6},
>>>>>     .type = SOCK_STREAM,
>>>>>     .protocol = 0
>>>>> };
>>>>
>>>> We should indeed include the protocol type in the rule definition.
>>>>
>>>>>
>>>>> Btw, creation of SMC sockets via IP stack was added quite recently.
>>>>> So far, creation has been possible only with AF_SMC family.
>>>>>
>>>>> https://lore.kernel.org/all/1718301630-63692-1-git-send-email-alibuda@linux.alibaba.com/
>>>>>
>>>>>>
>>>>>>
>>>>>>> Secondly, I'm not really sure if socket type granularity is required
>>>>>>> for most of the protocols. It may be more convenient for the end 
>>>>>>> user
>>>>>>> to be able to completely restrict the address family without 
>>>>>>> specifying
>>>>>>> whether restriction is dedicated to stream or dgram sockets (e.g. 
>>>>>>> for
>>>>>>> BLUETOOTH, VSOCK sockets). However, this is not a big issue for the
>>>>>>> current design, since address family can be restricted by specifying
>>>>>>> type = SOCK_TYPE_MASK.
>>>>
>>>> It looks like SOCK_TYPE_MASK is not part of UAPI, which means it could
>>>> change with kernel versions (even while being in UAPI in fact).  This
>>>> new socket creation control should allow to deny any socket creation
>>>> known or unknow at the time of the user space program build, and
>>>> whatever the available C headers.
>>>
>>> Agreed
>>>
>>>>
>>>> This also means that Landlock should accept any domain, type, and
>>>> protocols defined in rules.  Indeed, we don't want to reject rules for
>>>> which some protocols are not allowed.
>>>
>>> Do you mean that Landlock should not make any assumptions about this
>>> values during a build time? Currently, patchset provides boundary checks
>>> for domain (< AF_MAX) and type (< SOCK_MAX) in landlock_add_rule().
>>
>> The *running kernel* may not support some socket's domains or types,
>> which may be confusing for users if the rule was tested on a kernel
>> supporting such domains/types. >
>> For the bitmask of domains or types, the issues to keep boundary checks
>> would be when a subset of them is not supported.  Landlock would reject
>> such rule and it would be difficult for users to identify the cause.
> 
> Ok, I'll remove these checks.
> 
>>
>> I'm still wondering if the landlock_append_net_rule()'s -EAFNOSUPPORT
>> return value for kernels without CONFIG_INET was a good idea.  We should
>> probably return 0 in this case, which would be similar to not checking
>> socket's domains nor types.
> 
> It seems that returning -EAFNOSUPPORT only complicates error checking
> for landlock_append_net_rule() from the user's perspective. Probably the
> only reason to check the correctness of restricted objects in Landlock
> is to provide errors consistency in hooks.
> 
>>
>>>
>>>>
>>>> What about using bitmasks for the domain and type fields (renamed to
>>>> "domains" and "types")?  The last protocol is currently 45/MCTP so a
>>>> 64-bit field is enough, and 10/SOCK_PACKET also fits for the last 
>>>> socket
>>>> type.
>>>>
>>>> We cannot do the same with the protocol because the higher one is
>>>> 262/MPTCP though.  But it looks like a value of 0 (default protocol)
>>>> should be enough for most use cases, and users could specify a protocol
>>>> (but this time as a number, not a bitmask).
>>>>
>>>> To sum up, we could have something like this:
>>>>
>>>>     const struct landlock_socket_attr create_socket_attr = {
>>>>         .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>>         .families = 1 << AF_INET | 1 << AF_INET6,
>>>>         .types = 1 << SOCK_STREAM,
>>>>         .protocol = IPPROTO_SCTP
>>>>     };
>>>
>>> Looks good! I think it's a nice approach which will provide a sufficient
>>> level of flexibility to define a single rule for a specific protocol (or
>>> for related protocols).
>>>
>>> But, this adds possibility to define a single rule for the set of
>>> unrelated protocols:
>>>
>>> /* Allows TCP, UDP and UNIX sockets. */
>>> const struct landlock_socket_attr create_socket_attr = {
>>>     .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>     .families = 1 << AF_INET | 1 << AF_INET6 | 1 << AF_UNIX,
>>>     .types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>>>     .protocol = 0
>>> };
>>>
>>> Perhaps limiting the addition of one rule to only one address family
>>> would be more clear in terms of rule semantics?:
>>>
>>> /* Allows TCP, UDP, UNIX STREAM, UNIX DGRAM sockets. */
>>> const struct landlock_socket_attr create_socket_attrs[] = {
>>>     {
>>>         /* Allows IPv4 TCP and UDP sockets. */
>>>         .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>         .family = AF_INET,
>>>         .types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>>>         .protocol = 0
>>>     },
>>>     {
>>>         /* Allows IPv6 TCP and UDP sockets. */
>>>         .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>         .family = AF_INET6,
>>>         .types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>>>         .protocol = 0
>>>     },
>>>     {
>>>         /* Allows UNIX sockets. */
>>>         .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>         .family = AF_UNIX,
>>>         .types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>>>         .protocol = 0
>>>     },
>>> };
>>
>> Because we are already mixing bitmasks and (protocol) value, I'm not
>> sure it will help much.  I think in most cases the "families" bitmask
>> would handle IPv4 and IPv6 the same (e.g. to only allow TCP with one
>> rule).  I think this is also required to be able to have a 1:1 mapping
>> with SELinux's socket_type_to_security_class().
> 
> Ok, agreed

The bitmask approach leads to a complete refactoring of socket rule
storage. This shouldn't be a big issue, since we're gonna need
multiplexer for insert_rule(), find_rule() with a port range feature
anyway [1]. But it seems that the best approach of storing rules
composed of bitmasks is to store them in linked list and perform
linear scan in landlock_find_rule(). Any other approach is likely to
be too heavy and complex.

Do you think such refactoring is reasonable?

[1] https://github.com/landlock-lsm/linux/issues/16

> 
>>
>>>
>>>>
>>>>
>>>>>>
>>>>>> Whether the user is adding one rule to permit AF_INET+*, or 
>>>>>> whether the user is
>>>>>> adding two rules to permit (1) AF_INET+SOCK_STREAM and (2) 
>>>>>> AF_INET+SOCK_DGRAM,
>>>>>> that does not seem like a big deal to me as long as the list of such
>>>>>> combinations is so low?
>>>>>
>>>>> Agreed
>>>>
>>>> I also agree, but this might change if users have to set a combination
>>>> of families, types, and protocols.  This should be OK with the bitmask
>>>> approach though.
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> I suggest implementing something close to selinux socket classes 
>>>>>>> for the
>>>>>>> struct landlock_socket_attr (Cf. 
>>>>>>> socket_type_to_security_class()). This
>>>>>>> will provide protocol granularity and may be simpler and more 
>>>>>>> convenient
>>>>>>> in the terms of determining access rights. WDYT?
>>>>>>
>>>>>> I see that this is a longer switch statement that maps to this 
>>>>>> enum, it would be
>>>>>> an additional data table that would have to be documented 
>>>>>> separately for users.
>>>>>
>>>>> This table is the general drawback, since it makes API a bit more
>>>>> complex.
>>>>>
>>>>>>
>>>>>> Do you have an example for how such a "security class enum" would 
>>>>>> map to the
>>>>>> combinations of family, type and socket for the protocols 
>>>>>> discussed above?
>>>>>
>>>>> I think the socket_type_to_security_class() has a pretty good mapping
>>>>> for UNIX and IP families.
>>>>
>>>> The mapping looks good indeed, and it has been tested for a long time
>>>> with many applications.  However, this would make the kernel
>>>> implementation more complex, and I think this mapping could easily be
>>>> implemented in user space libraries with the bitmask approach, if 
>>>> really
>>>> needed, which I'm not sure.
>>>
>>> I agree, implementing this in a library is a better approach. Thanks for
>>> the catch!
>>>
>>>>
>>>>>
>>>>>>
>>>>>> If this is just a matter of actually mapping (family, type, protocol)
>>>>>> combinations in a more flexible way, could we get away by allowing 
>>>>>> a special
>>>>>> "wildcard" value for the "protocol" field, when it is used within 
>>>>>> a ruleset?
>>>>>> Then the LSM would have to look up whether there is a rule for 
>>>>>> (family, type,
>>>>>> protocol) and the only change would be that it now needs to also 
>>>>>> check whether
>>>>>> there is a rule for (family, type, *)?
>>>>>
>>>>> Something like this?
>>>>>
>>>>> const struct landlock_socket_attr create_socket_attr = {
>>>>>     .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>>>     .family = AF_INET6,
>>>>>     .type = SOCK_DGRAM,
>>>>>     .protocol = LANDLOCK_SOCKET_PROTO_ALL
>>>>> };
>>>>>
>>>>>>
>>>>>> —Günther
>>>>>
>>>
Günther Noack Jan. 10, 2025, 11:12 a.m. UTC | #11
Happy New Year!

On Tue, Dec 24, 2024 at 07:55:01PM +0300, Mikhail Ivanov wrote:
> The bitmask approach leads to a complete refactoring of socket rule
> storage. This shouldn't be a big issue, since we're gonna need
> multiplexer for insert_rule(), find_rule() with a port range feature
> anyway [1]. But it seems that the best approach of storing rules
> composed of bitmasks is to store them in linked list and perform
> linear scan in landlock_find_rule(). Any other approach is likely to
> be too heavy and complex.
> 
> Do you think such refactoring is reasonable?
> 
> [1] https://github.com/landlock-lsm/linux/issues/16

The way I understood it in your mail from Nov 28th [1], I thought that the
bitmasks would only exist at the UAPI layer so that users could more
conveniently specify multiple "types" at the same time.  In other
words, a rule which is now expressed as

  {
    .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
    .family = AF_INET,
    .types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
    .protocol = 0,
  },

used to be expressed like this (without bitmasks):

  {
    .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
    .family = AF_INET,
    .type = SOCK_STREAM,
    .protocol = 0,
  },
  {
    .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
    .family = AF_INET,
    .type = SOCK_DGRAM,
    .protocol = 0,
  },

I do not understand why this convenience feature in the UAPI layer
requires a change to the data structures that Landlock uses
internally?  As far as I can tell, struct landlock_socket_attr is only
used in syscalls.c and converted to other data structures already.  I
would have imagined that we'd "unroll" the specified bitmasks into the
possible combinations in the add_rule_socket() function and then call
landlock_append_socket_rule() multiple times with each of these?


That being said, I am not a big fan of red-black trees for such simple
integer lookups either, and I also think there should be something
better if we make more use of the properties of the input ranges. The
question is though whether you want to couple that to this socket type
patch set, or rather do it in a follow up?  (So far we have been doing
fine with the red black trees, and we are already contemplating the
possibility of changing these internal structures in [2].  We have
also used RB trees for the "port" rules with a similar reasoning,
IIRC.)

Regarding the port range feature, I am also not sure whether the data
structure for that would even be similar?  Looking for a containment
in a set of integer ranges is a different task than looking for an
exact match in a non-contiguous set of integers.

In any case, I feel that for now, an exact look up in the RB tree
would work fine as a generic solution (especially considering that the
set of added rules is probably usually small).  IMHO, finding a more
appropriate data structure might be a can of worms that could further
delay the patch set and which might better be discussed separately.

WDYT?

–Günther

[1] https://lore.kernel.org/all/eafd855d-2681-8dfd-a2be-9c02fc07050d@huawei-partners.com/
[2] https://github.com/landlock-lsm/linux/issues/1
Mikhail Ivanov Jan. 10, 2025, 1:02 p.m. UTC | #12
On 1/10/2025 2:12 PM, Günther Noack wrote:
> Happy New Year!

Happy New Year! Glad to see you :)

> 
> On Tue, Dec 24, 2024 at 07:55:01PM +0300, Mikhail Ivanov wrote:
>> The bitmask approach leads to a complete refactoring of socket rule
>> storage. This shouldn't be a big issue, since we're gonna need
>> multiplexer for insert_rule(), find_rule() with a port range feature
>> anyway [1]. But it seems that the best approach of storing rules
>> composed of bitmasks is to store them in linked list and perform
>> linear scan in landlock_find_rule(). Any other approach is likely to
>> be too heavy and complex.
>>
>> Do you think such refactoring is reasonable?
>>
>> [1] https://github.com/landlock-lsm/linux/issues/16
> 
> The way I understood it in your mail from Nov 28th [1], I thought that the
> bitmasks would only exist at the UAPI layer so that users could more
> conveniently specify multiple "types" at the same time.  In other
> words, a rule which is now expressed as
> 
>    {
>      .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>      .family = AF_INET,
>      .types = 1 << SOCK_STREAM | 1 << SOCK_DGRAM,
>      .protocol = 0,
>    },
> 
> used to be expressed like this (without bitmasks):
> 
>    {
>      .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>      .family = AF_INET,
>      .type = SOCK_STREAM,
>      .protocol = 0,
>    },
>    {
>      .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>      .family = AF_INET,
>      .type = SOCK_DGRAM,
>      .protocol = 0,
>    },

Correct, but we also agreed to use bitmasks for "family" field as well:

https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/

> 
> I do not understand why this convenience feature in the UAPI layer
> requires a change to the data structures that Landlock uses
> internally?  As far as I can tell, struct landlock_socket_attr is only
> used in syscalls.c and converted to other data structures already.  I
> would have imagined that we'd "unroll" the specified bitmasks into the
> possible combinations in the add_rule_socket() function and then call
> landlock_append_socket_rule() multiple times with each of these?

I thought about unrolling bitmask into multiple entries in rbtree, and
came up with following possible issue:

Imagine that a user creates a rule that allows to create sockets of all
possible families and types (with protocol=0 for example):
{
	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
	.families = INT64_MAX, /* 64 set bits */
	.types = INT16_MAX, /* 16 set bits */
	.protocol = 0,
},

This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
struct landlock_rule, which is used to store rules, weighs 40B. So, it
will be 40kB by a single rule. Even if we allow rules with only
"existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
by a single rule.

I understand that this may be degenerate case and most common rule will
result in less then 8 (or 4) entries in rbtree, but I think API should
be as intuitive as possible. User can expect to see the same
memory usage regardless of the content of the rule.

I'm not really sure if this case is really an issue, so I'd be glad
to hear your opinion on it.

I also initially thought that it would be difficult to handle errors
when adding rule. But it seems that it's not gonna be an issue with
correctly implemented removal (this will result in additional method in
ruleset.c and small wrapper over rule structure that would not affect
ruleset domain implementation).

> 
> 
> That being said, I am not a big fan of red-black trees for such simple
> integer lookups either, and I also think there should be something
> better if we make more use of the properties of the input ranges. The
> question is though whether you want to couple that to this socket type
> patch set, or rather do it in a follow up?  (So far we have been doing
> fine with the red black trees, and we are already contemplating the
> possibility of changing these internal structures in [2].  We have
> also used RB trees for the "port" rules with a similar reasoning,
> IIRC.)

I think it'll be better to have a separate series for [2] if the socket
restriction can be implemented without rbtree refactoring.

> 
> Regarding the port range feature, I am also not sure whether the data
> structure for that would even be similar?  Looking for a containment
> in a set of integer ranges is a different task than looking for an
> exact match in a non-contiguous set of integers.
It seems like it would be better to have a different structure for
non-ranged lookups if it results in less memory and lookup duration.
First, we need to check possible candidates for both cases.

> 
> In any case, I feel that for now, an exact look up in the RB tree
> would work fine as a generic solution (especially considering that the
> set of added rules is probably usually small).  IMHO, finding a more
> appropriate data structure might be a can of worms that could further
> delay the patch set and which might better be discussed separately.
> 
> WDYT?

I agree if you think that worst case presented above is not a big issue.

> 
> –Günther
> 
> [1] https://lore.kernel.org/all/eafd855d-2681-8dfd-a2be-9c02fc07050d@huawei-partners.com/
> [2] https://github.com/landlock-lsm/linux/issues/1
Günther Noack Jan. 10, 2025, 4:27 p.m. UTC | #13
On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
> Correct, but we also agreed to use bitmasks for "family" field as well:
> 
> https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/

OK


> On 1/10/2025 2:12 PM, Günther Noack wrote:
> > I do not understand why this convenience feature in the UAPI layer
> > requires a change to the data structures that Landlock uses
> > internally?  As far as I can tell, struct landlock_socket_attr is only
> > used in syscalls.c and converted to other data structures already.  I
> > would have imagined that we'd "unroll" the specified bitmasks into the
> > possible combinations in the add_rule_socket() function and then call
> > landlock_append_socket_rule() multiple times with each of these?
> 
> I thought about unrolling bitmask into multiple entries in rbtree, and
> came up with following possible issue:
> 
> Imagine that a user creates a rule that allows to create sockets of all
> possible families and types (with protocol=0 for example):
> {
> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 	.families = INT64_MAX, /* 64 set bits */
> 	.types = INT16_MAX, /* 16 set bits */
> 	.protocol = 0,
> },
> 
> This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
> struct landlock_rule, which is used to store rules, weighs 40B. So, it
> will be 40kB by a single rule. Even if we allow rules with only
> "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
> by a single rule.
> 
> I understand that this may be degenerate case and most common rule will
> result in less then 8 (or 4) entries in rbtree, but I think API should
> be as intuitive as possible. User can expect to see the same
> memory usage regardless of the content of the rule.
> 
> I'm not really sure if this case is really an issue, so I'd be glad
> to hear your opinion on it.

I think there are two separate questions here:

(a) I think it is OK that it is *possible* to allocate 40kB of kernel
    memory.  At least, this is already possible today by calling
    landlock_add_rule() repeatedly.

    I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
    potential damage to the caller here?  That flag was added in the
    Landlock v19 patch set [1] ("Account objects to kmemcg.").

(b) I agree it might be counterintuitive when a single
    landlock_add_rule() call allocates more space than expected.

Mickaël, since it is already possible today (but harder), I assume
that you have thought about this problem before -- is it a problem
when users allocate more kernel memory through Landlock than they
expected?


Naive proposal:

If this is an issue, how about we set a low limit to the number of
families and the number of types that can be used in a single
landlock_add_rule() invocation?  (It tends to be easier to start with
a restrictive API and open it up later than the other way around.)

For instance,

* In the families field, at most 2 bits may be set.
* In the types field, at most 2 bits may be set.

In my understanding, the main use case of the bit vectors is that
there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
do not know scenarios where much more than that is needed?  With that,
we would still keep people from accidentally allocating larger amounts
of memory, while permitting the main use case.

Having independent limits for the family and type fields is a bit
easier to understand and document than imposing a limit on the
multiplication result.

> > That being said, I am not a big fan of red-black trees for such simple
> > integer lookups either, and I also think there should be something
> > better if we make more use of the properties of the input ranges. The
> > question is though whether you want to couple that to this socket type
> > patch set, or rather do it in a follow up?  (So far we have been doing
> > fine with the red black trees, and we are already contemplating the
> > possibility of changing these internal structures in [2].  We have
> > also used RB trees for the "port" rules with a similar reasoning,
> > IIRC.)
> 
> I think it'll be better to have a separate series for [2] if the socket
> restriction can be implemented without rbtree refactoring.

Sounds good to me. 👍

–Günther

[1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/
Mikhail Ivanov Jan. 10, 2025, 4:55 p.m. UTC | #14
On 1/10/2025 7:27 PM, Günther Noack wrote:
> On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
>> Correct, but we also agreed to use bitmasks for "family" field as well:
>>
>> https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/
> 
> OK
> 
> 
>> On 1/10/2025 2:12 PM, Günther Noack wrote:
>>> I do not understand why this convenience feature in the UAPI layer
>>> requires a change to the data structures that Landlock uses
>>> internally?  As far as I can tell, struct landlock_socket_attr is only
>>> used in syscalls.c and converted to other data structures already.  I
>>> would have imagined that we'd "unroll" the specified bitmasks into the
>>> possible combinations in the add_rule_socket() function and then call
>>> landlock_append_socket_rule() multiple times with each of these?
>>
>> I thought about unrolling bitmask into multiple entries in rbtree, and
>> came up with following possible issue:
>>
>> Imagine that a user creates a rule that allows to create sockets of all
>> possible families and types (with protocol=0 for example):
>> {
>> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>> 	.families = INT64_MAX, /* 64 set bits */
>> 	.types = INT16_MAX, /* 16 set bits */
>> 	.protocol = 0,
>> },
>>
>> This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
>> struct landlock_rule, which is used to store rules, weighs 40B. So, it
>> will be 40kB by a single rule. Even if we allow rules with only
>> "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
>> by a single rule.
>>
>> I understand that this may be degenerate case and most common rule will
>> result in less then 8 (or 4) entries in rbtree, but I think API should
>> be as intuitive as possible. User can expect to see the same
>> memory usage regardless of the content of the rule.
>>
>> I'm not really sure if this case is really an issue, so I'd be glad
>> to hear your opinion on it.
> 
> I think there are two separate questions here:
> 
> (a) I think it is OK that it is *possible* to allocate 40kB of kernel
>      memory.  At least, this is already possible today by calling
>      landlock_add_rule() repeatedly.
> 
>      I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
>      potential damage to the caller here?  That flag was added in the
>      Landlock v19 patch set [1] ("Account objects to kmemcg.").
> 
> (b) I agree it might be counterintuitive when a single
>      landlock_add_rule() call allocates more space than expected.
> 
> Mickaël, since it is already possible today (but harder), I assume
> that you have thought about this problem before -- is it a problem
> when users allocate more kernel memory through Landlock than they
> expected?
> 
> 
> Naive proposal:
> 
> If this is an issue, how about we set a low limit to the number of
> families and the number of types that can be used in a single
> landlock_add_rule() invocation?  (It tends to be easier to start with
> a restrictive API and open it up later than the other way around.)

Looks like a good approach! Better to return with an error (which almost
always won't happen) and let the user to refactor the code than to
allow ruleset to eat an unexpected amount of memory.

> 
> For instance,
> 
> * In the families field, at most 2 bits may be set.
> * In the types field, at most 2 bits may be set.

Or we can say that rule can contain not more than 4 combinations of
family and type.

BTW, 4 seems to be sufficient, at least for IP protocols.

> 
> In my understanding, the main use case of the bit vectors is that
> there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
> do not know scenarios where much more than that is needed?  With that,
> we would still keep people from accidentally allocating larger amounts
> of memory, while permitting the main use case.

Agreed

> 
> Having independent limits for the family and type fields is a bit
> easier to understand and document than imposing a limit on the
> multiplication result.
> 
>>> That being said, I am not a big fan of red-black trees for such simple
>>> integer lookups either, and I also think there should be something
>>> better if we make more use of the properties of the input ranges. The
>>> question is though whether you want to couple that to this socket type
>>> patch set, or rather do it in a follow up?  (So far we have been doing
>>> fine with the red black trees, and we are already contemplating the
>>> possibility of changing these internal structures in [2].  We have
>>> also used RB trees for the "port" rules with a similar reasoning,
>>> IIRC.)
>>
>> I think it'll be better to have a separate series for [2] if the socket
>> restriction can be implemented without rbtree refactoring.
> 
> Sounds good to me. 👍
> 
> –Günther
> 
> [1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/
Mickaël Salaün Jan. 14, 2025, 6:31 p.m. UTC | #15
Happy new year!

On Fri, Jan 10, 2025 at 07:55:10PM +0300, Mikhail Ivanov wrote:
> On 1/10/2025 7:27 PM, Günther Noack wrote:
> > On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
> > > Correct, but we also agreed to use bitmasks for "family" field as well:
> > > 
> > > https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/
> > 
> > OK
> > 
> > 
> > > On 1/10/2025 2:12 PM, Günther Noack wrote:
> > > > I do not understand why this convenience feature in the UAPI layer
> > > > requires a change to the data structures that Landlock uses
> > > > internally?  As far as I can tell, struct landlock_socket_attr is only
> > > > used in syscalls.c and converted to other data structures already.  I
> > > > would have imagined that we'd "unroll" the specified bitmasks into the
> > > > possible combinations in the add_rule_socket() function and then call
> > > > landlock_append_socket_rule() multiple times with each of these?

I agree that UAPI should not necessarily dictate the kernel
implementation.

> > > 
> > > I thought about unrolling bitmask into multiple entries in rbtree, and
> > > came up with following possible issue:
> > > 
> > > Imagine that a user creates a rule that allows to create sockets of all
> > > possible families and types (with protocol=0 for example):
> > > {
> > > 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > > 	.families = INT64_MAX, /* 64 set bits */
> > > 	.types = INT16_MAX, /* 16 set bits */
> > > 	.protocol = 0,
> > > },
> > > 
> > > This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
> > > struct landlock_rule, which is used to store rules, weighs 40B. So, it
> > > will be 40kB by a single rule. Even if we allow rules with only
> > > "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
> > > by a single rule.
> > > 
> > > I understand that this may be degenerate case and most common rule will
> > > result in less then 8 (or 4) entries in rbtree, but I think API should
> > > be as intuitive as possible. User can expect to see the same
> > > memory usage regardless of the content of the rule.
> > > 
> > > I'm not really sure if this case is really an issue, so I'd be glad
> > > to hear your opinion on it.
> > 
> > I think there are two separate questions here:
> > 
> > (a) I think it is OK that it is *possible* to allocate 40kB of kernel
> >      memory.  At least, this is already possible today by calling
> >      landlock_add_rule() repeatedly.
> > 
> >      I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
> >      potential damage to the caller here?  That flag was added in the
> >      Landlock v19 patch set [1] ("Account objects to kmemcg.").
> > 
> > (b) I agree it might be counterintuitive when a single
> >      landlock_add_rule() call allocates more space than expected.
> > 
> > Mickaël, since it is already possible today (but harder), I assume
> > that you have thought about this problem before -- is it a problem
> > when users allocate more kernel memory through Landlock than they
> > expected?

The imbalance between the user request and the required kernel memory is
interesting.  It would not be a security issue thanks to the
GFP_KERNEL_ACCOUNT, but it can be surprising for users that for some
requests they can receive -ENOMEM but not for quite-similar ones (e.g.
with only some bits different).  We should keep the principle of least
astonishment in mind, but also the fact that the kernel implementation
and the related required memory may change between two kernel versions
for the exact same user request (because of Landlock implementation
differences or other parts of the kernel).  In summary, we should be
careful to prevent users from being able to use a large amount of memory
with only one syscall.  This which would be an usability issue.

> > 
> > 
> > Naive proposal:
> > 
> > If this is an issue, how about we set a low limit to the number of
> > families and the number of types that can be used in a single
> > landlock_add_rule() invocation?  (It tends to be easier to start with
> > a restrictive API and open it up later than the other way around.)
> 
> Looks like a good approach! Better to return with an error (which almost
> always won't happen) and let the user to refactor the code than to
> allow ruleset to eat an unexpected amount of memory.
> 
> > 
> > For instance,
> > 
> > * In the families field, at most 2 bits may be set.
> > * In the types field, at most 2 bits may be set.
> 
> Or we can say that rule can contain not more than 4 combinations of
> family and type.
> 
> BTW, 4 seems to be sufficient, at least for IP protocols.
> 
> > 
> > In my understanding, the main use case of the bit vectors is that
> > there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
> > do not know scenarios where much more than that is needed?  With that,
> > we would still keep people from accidentally allocating larger amounts
> > of memory, while permitting the main use case.
> 
> Agreed
> 
> > 
> > Having independent limits for the family and type fields is a bit
> > easier to understand and document than imposing a limit on the
> > multiplication result.

This is a safer approach that can indeed be documented, but it looks
unintuitive and an arbitrary limitation.  Here is another proposal:

Let's ignore my previous suggestion to use bitfields for families and
protocols.  To keep it simple, we can get back to the initial struct but
specifically handle the (i64)-1 value (which cannot be a family,
protocol, nor a type):
{
	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
	.family = AF_INET,
	.type = SOCK_STREAM,
	.protocol = -1,
},

This would read: deny socket creation except for AF_INET with
SOCK_STREAM (and any protocol).

Users might need to add several rules (e.g. one for AF_INET and another
for AF_INET6) but that's OK.  Unfortunately we cannot identify a TCP
socket with only protocol = IPPROTO_TCP because protocol definitions
are relative to network families.  Specifying the protocol without the
family should then return an error.

Before rule could be loaded, users define how much they want to match a
socket: at least the family, optionally the type, and if the type is
also set then the protocol can also be set.  These dependencies are
required to transform this triplet to a key number, see below.

A landlock_ruleset_attr.handled_socket_layers field would define how
much we want to match a socket:
- 1: family only
- 2: family and type
- 3: family, type, and protocol

According to this ruleset's property, users will be allowed to fill the
family, type, or protocol fields in landlock_socket_attr rules.  If a
socket layer is not handled, it should contain (i64)-1 for the kernel to
detect misuse of the API.

This enables us to get a key from this triplet:

family_bits = 6; // 45 values for now
type_bits = 3; // 7 values for now
protocol_bits = 5; // 28 values for now

// attr.* are the sanitized UAPI values, including -1 replaced with 0.
// In this example, landlock_ruleset_attr.handled_socket_layers is 3, so
// the key is composed of all the 3 properties.
landlock_key.data = attr.family << (type_bits + protocol_bits) |
                    attr.type << protocol_bits | attr.protocol;

For each layer of restriction in a domain, we know how precise they
define a socket (i.e. with how many "socket layers").  We can then look
for at most 3 entries in the red-black tree: one with only the family,
another with the family and the type, and potentially a third also
including the protocol.  Each key would have the same significant bits
but with the lower bits masked according to each
landlock_ruleset_attr.handled_socket_layers .  Composing the related
access masks according to the defined socket layers, we can create an
array of struct access_masks for the request and then check if such
request is allowed by the current domain.  As for the currently stored
data, we can also identify the domain layer that blocked the request
(required for audit).

With this design, each sandbox can define a socket as much as it wants.

The downside is that we lost the bitfields and we need several calls to
filter more complex sockets (e.g. 4 for UDP and TCP with IPv4 and IPv6),
which looks OK compared to the required calls for filesystem access
control.

> > 
> > > > That being said, I am not a big fan of red-black trees for such simple
> > > > integer lookups either, and I also think there should be something
> > > > better if we make more use of the properties of the input ranges. The
> > > > question is though whether you want to couple that to this socket type
> > > > patch set, or rather do it in a follow up?  (So far we have been doing
> > > > fine with the red black trees, and we are already contemplating the
> > > > possibility of changing these internal structures in [2].  We have
> > > > also used RB trees for the "port" rules with a similar reasoning,
> > > > IIRC.)
> > > 
> > > I think it'll be better to have a separate series for [2] if the socket
> > > restriction can be implemented without rbtree refactoring.
> > 
> > Sounds good to me. 👍
> > 
> > [1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/

red-black trees are a good generic data structure for the current main
use case (for dynamic rulesets and static domains), but we'll need to
use more appropriate data structures.  I think this should not be a
blocker for this patch series.  It will be required to match (port)
ranges though (even if the use case seems limited), and in general for
better performances.
Mikhail Ivanov Jan. 24, 2025, 12:28 p.m. UTC | #16
On 1/14/2025 9:31 PM, Mickaël Salaün wrote:
> Happy new year!
> 
> On Fri, Jan 10, 2025 at 07:55:10PM +0300, Mikhail Ivanov wrote:
>> On 1/10/2025 7:27 PM, Günther Noack wrote:
>>> On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
>>>> Correct, but we also agreed to use bitmasks for "family" field as well:
>>>>
>>>> https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/
>>>
>>> OK
>>>
>>>
>>>> On 1/10/2025 2:12 PM, Günther Noack wrote:
>>>>> I do not understand why this convenience feature in the UAPI layer
>>>>> requires a change to the data structures that Landlock uses
>>>>> internally?  As far as I can tell, struct landlock_socket_attr is only
>>>>> used in syscalls.c and converted to other data structures already.  I
>>>>> would have imagined that we'd "unroll" the specified bitmasks into the
>>>>> possible combinations in the add_rule_socket() function and then call
>>>>> landlock_append_socket_rule() multiple times with each of these?
> 
> I agree that UAPI should not necessarily dictate the kernel
> implementation.
> 
>>>>
>>>> I thought about unrolling bitmask into multiple entries in rbtree, and
>>>> came up with following possible issue:
>>>>
>>>> Imagine that a user creates a rule that allows to create sockets of all
>>>> possible families and types (with protocol=0 for example):
>>>> {
>>>> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
>>>> 	.families = INT64_MAX, /* 64 set bits */
>>>> 	.types = INT16_MAX, /* 16 set bits */
>>>> 	.protocol = 0,
>>>> },
>>>>
>>>> This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
>>>> struct landlock_rule, which is used to store rules, weighs 40B. So, it
>>>> will be 40kB by a single rule. Even if we allow rules with only
>>>> "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
>>>> by a single rule.
>>>>
>>>> I understand that this may be degenerate case and most common rule will
>>>> result in less then 8 (or 4) entries in rbtree, but I think API should
>>>> be as intuitive as possible. User can expect to see the same
>>>> memory usage regardless of the content of the rule.
>>>>
>>>> I'm not really sure if this case is really an issue, so I'd be glad
>>>> to hear your opinion on it.
>>>
>>> I think there are two separate questions here:
>>>
>>> (a) I think it is OK that it is *possible* to allocate 40kB of kernel
>>>       memory.  At least, this is already possible today by calling
>>>       landlock_add_rule() repeatedly.
>>>
>>>       I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
>>>       potential damage to the caller here?  That flag was added in the
>>>       Landlock v19 patch set [1] ("Account objects to kmemcg.").
>>>
>>> (b) I agree it might be counterintuitive when a single
>>>       landlock_add_rule() call allocates more space than expected.
>>>
>>> Mickaël, since it is already possible today (but harder), I assume
>>> that you have thought about this problem before -- is it a problem
>>> when users allocate more kernel memory through Landlock than they
>>> expected?
> 
> The imbalance between the user request and the required kernel memory is
> interesting.  It would not be a security issue thanks to the
> GFP_KERNEL_ACCOUNT, but it can be surprising for users that for some
> requests they can receive -ENOMEM but not for quite-similar ones (e.g.
> with only some bits different).  We should keep the principle of least
> astonishment in mind, but also the fact that the kernel implementation
> and the related required memory may change between two kernel versions
> for the exact same user request (because of Landlock implementation
> differences or other parts of the kernel).  In summary, we should be
> careful to prevent users from being able to use a large amount of memory
> with only one syscall.  This which would be an usability issue.
> 
>>>
>>>
>>> Naive proposal:
>>>
>>> If this is an issue, how about we set a low limit to the number of
>>> families and the number of types that can be used in a single
>>> landlock_add_rule() invocation?  (It tends to be easier to start with
>>> a restrictive API and open it up later than the other way around.)
>>
>> Looks like a good approach! Better to return with an error (which almost
>> always won't happen) and let the user to refactor the code than to
>> allow ruleset to eat an unexpected amount of memory.
>>
>>>
>>> For instance,
>>>
>>> * In the families field, at most 2 bits may be set.
>>> * In the types field, at most 2 bits may be set.
>>
>> Or we can say that rule can contain not more than 4 combinations of
>> family and type.
>>
>> BTW, 4 seems to be sufficient, at least for IP protocols.
>>
>>>
>>> In my understanding, the main use case of the bit vectors is that
>>> there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
>>> do not know scenarios where much more than that is needed?  With that,
>>> we would still keep people from accidentally allocating larger amounts
>>> of memory, while permitting the main use case.
>>
>> Agreed
>>
>>>
>>> Having independent limits for the family and type fields is a bit
>>> easier to understand and document than imposing a limit on the
>>> multiplication result.
> 
> This is a safer approach that can indeed be documented, but it looks
> unintuitive and an arbitrary limitation.  Here is another proposal:
> 
> Let's ignore my previous suggestion to use bitfields for families and
> protocols.  To keep it simple, we can get back to the initial struct but
> specifically handle the (i64)-1 value (which cannot be a family,
> protocol, nor a type):
> {
> 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> 	.family = AF_INET,
> 	.type = SOCK_STREAM,
> 	.protocol = -1,
> },
> 
> This would read: deny socket creation except for AF_INET with
> SOCK_STREAM (and any protocol).
> 
> Users might need to add several rules (e.g. one for AF_INET and another
> for AF_INET6) but that's OK.  Unfortunately we cannot identify a TCP
> socket with only protocol = IPPROTO_TCP because protocol definitions
> are relative to network families.  Specifying the protocol without the
> family should then return an error.
> 
> Before rule could be loaded, users define how much they want to match a
> socket: at least the family, optionally the type, and if the type is
> also set then the protocol can also be set.  These dependencies are
> required to transform this triplet to a key number, see below.
> 
> A landlock_ruleset_attr.handled_socket_layers field would define how
> much we want to match a socket:
> - 1: family only
> - 2: family and type
> - 3: family, type, and protocol
> 
> According to this ruleset's property, users will be allowed to fill the
> family, type, or protocol fields in landlock_socket_attr rules.  If a
> socket layer is not handled, it should contain (i64)-1 for the kernel to
> detect misuse of the API.
> 
> This enables us to get a key from this triplet:
> 
> family_bits = 6; // 45 values for now
> type_bits = 3; // 7 values for now
> protocol_bits = 5; // 28 values for now
> 
> // attr.* are the sanitized UAPI values, including -1 replaced with 0.
> // In this example, landlock_ruleset_attr.handled_socket_layers is 3, so
> // the key is composed of all the 3 properties.
> landlock_key.data = attr.family << (type_bits + protocol_bits) |
>                      attr.type << protocol_bits | attr.protocol;
> 
> For each layer of restriction in a domain, we know how precise they
> define a socket (i.e. with how many "socket layers").  We can then look
> for at most 3 entries in the red-black tree: one with only the family,
> another with the family and the type, and potentially a third also
> including the protocol.  Each key would have the same significant bits
> but with the lower bits masked according to each
> landlock_ruleset_attr.handled_socket_layers .  Composing the related
> access masks according to the defined socket layers, we can create an
> array of struct access_masks for the request and then check if such
> request is allowed by the current domain.  As for the currently stored
> data, we can also identify the domain layer that blocked the request
> (required for audit).

I do not quite understand why we need socket_layers. Without it,
user can set (i64)(-1) to type or protocol whenever he wants. While
transforming triplet to a key we can replace (i64)(-1) with some
constant (e.g. (1 << type_bits - 1) for the type if type_bits = 8 and
(1 << protocol_bits - 1) for the protocol if protocol_bits = 16).

> 
> With this design, each sandbox can define a socket as much as it wants.
> 
> The downside is that we lost the bitfields and we need several calls to
> filter more complex sockets (e.g. 4 for UDP and TCP with IPv4 and IPv6),
> which looks OK compared to the required calls for filesystem access
> control.
> 
>>>
>>>>> That being said, I am not a big fan of red-black trees for such simple
>>>>> integer lookups either, and I also think there should be something
>>>>> better if we make more use of the properties of the input ranges. The
>>>>> question is though whether you want to couple that to this socket type
>>>>> patch set, or rather do it in a follow up?  (So far we have been doing
>>>>> fine with the red black trees, and we are already contemplating the
>>>>> possibility of changing these internal structures in [2].  We have
>>>>> also used RB trees for the "port" rules with a similar reasoning,
>>>>> IIRC.)
>>>>
>>>> I think it'll be better to have a separate series for [2] if the socket
>>>> restriction can be implemented without rbtree refactoring.
>>>
>>> Sounds good to me. 👍
>>>
>>> [1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/
> 
> red-black trees are a good generic data structure for the current main
> use case (for dynamic rulesets and static domains), but we'll need to
> use more appropriate data structures.  I think this should not be a
> blocker for this patch series.  It will be required to match (port)
> ranges though (even if the use case seems limited), and in general for
> better performances.
Mickaël Salaün Jan. 24, 2025, 2:02 p.m. UTC | #17
On Fri, Jan 24, 2025 at 03:28:02PM +0300, Mikhail Ivanov wrote:
> On 1/14/2025 9:31 PM, Mickaël Salaün wrote:
> > Happy new year!
> > 
> > On Fri, Jan 10, 2025 at 07:55:10PM +0300, Mikhail Ivanov wrote:
> > > On 1/10/2025 7:27 PM, Günther Noack wrote:
> > > > On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
> > > > > Correct, but we also agreed to use bitmasks for "family" field as well:
> > > > > 
> > > > > https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/
> > > > 
> > > > OK
> > > > 
> > > > 
> > > > > On 1/10/2025 2:12 PM, Günther Noack wrote:
> > > > > > I do not understand why this convenience feature in the UAPI layer
> > > > > > requires a change to the data structures that Landlock uses
> > > > > > internally?  As far as I can tell, struct landlock_socket_attr is only
> > > > > > used in syscalls.c and converted to other data structures already.  I
> > > > > > would have imagined that we'd "unroll" the specified bitmasks into the
> > > > > > possible combinations in the add_rule_socket() function and then call
> > > > > > landlock_append_socket_rule() multiple times with each of these?
> > 
> > I agree that UAPI should not necessarily dictate the kernel
> > implementation.
> > 
> > > > > 
> > > > > I thought about unrolling bitmask into multiple entries in rbtree, and
> > > > > came up with following possible issue:
> > > > > 
> > > > > Imagine that a user creates a rule that allows to create sockets of all
> > > > > possible families and types (with protocol=0 for example):
> > > > > {
> > > > > 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > > > > 	.families = INT64_MAX, /* 64 set bits */
> > > > > 	.types = INT16_MAX, /* 16 set bits */
> > > > > 	.protocol = 0,
> > > > > },
> > > > > 
> > > > > This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
> > > > > struct landlock_rule, which is used to store rules, weighs 40B. So, it
> > > > > will be 40kB by a single rule. Even if we allow rules with only
> > > > > "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
> > > > > by a single rule.
> > > > > 
> > > > > I understand that this may be degenerate case and most common rule will
> > > > > result in less then 8 (or 4) entries in rbtree, but I think API should
> > > > > be as intuitive as possible. User can expect to see the same
> > > > > memory usage regardless of the content of the rule.
> > > > > 
> > > > > I'm not really sure if this case is really an issue, so I'd be glad
> > > > > to hear your opinion on it.
> > > > 
> > > > I think there are two separate questions here:
> > > > 
> > > > (a) I think it is OK that it is *possible* to allocate 40kB of kernel
> > > >       memory.  At least, this is already possible today by calling
> > > >       landlock_add_rule() repeatedly.
> > > > 
> > > >       I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
> > > >       potential damage to the caller here?  That flag was added in the
> > > >       Landlock v19 patch set [1] ("Account objects to kmemcg.").
> > > > 
> > > > (b) I agree it might be counterintuitive when a single
> > > >       landlock_add_rule() call allocates more space than expected.
> > > > 
> > > > Mickaël, since it is already possible today (but harder), I assume
> > > > that you have thought about this problem before -- is it a problem
> > > > when users allocate more kernel memory through Landlock than they
> > > > expected?
> > 
> > The imbalance between the user request and the required kernel memory is
> > interesting.  It would not be a security issue thanks to the
> > GFP_KERNEL_ACCOUNT, but it can be surprising for users that for some
> > requests they can receive -ENOMEM but not for quite-similar ones (e.g.
> > with only some bits different).  We should keep the principle of least
> > astonishment in mind, but also the fact that the kernel implementation
> > and the related required memory may change between two kernel versions
> > for the exact same user request (because of Landlock implementation
> > differences or other parts of the kernel).  In summary, we should be
> > careful to prevent users from being able to use a large amount of memory
> > with only one syscall.  This which would be an usability issue.
> > 
> > > > 
> > > > 
> > > > Naive proposal:
> > > > 
> > > > If this is an issue, how about we set a low limit to the number of
> > > > families and the number of types that can be used in a single
> > > > landlock_add_rule() invocation?  (It tends to be easier to start with
> > > > a restrictive API and open it up later than the other way around.)
> > > 
> > > Looks like a good approach! Better to return with an error (which almost
> > > always won't happen) and let the user to refactor the code than to
> > > allow ruleset to eat an unexpected amount of memory.
> > > 
> > > > 
> > > > For instance,
> > > > 
> > > > * In the families field, at most 2 bits may be set.
> > > > * In the types field, at most 2 bits may be set.
> > > 
> > > Or we can say that rule can contain not more than 4 combinations of
> > > family and type.
> > > 
> > > BTW, 4 seems to be sufficient, at least for IP protocols.
> > > 
> > > > 
> > > > In my understanding, the main use case of the bit vectors is that
> > > > there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
> > > > do not know scenarios where much more than that is needed?  With that,
> > > > we would still keep people from accidentally allocating larger amounts
> > > > of memory, while permitting the main use case.
> > > 
> > > Agreed
> > > 
> > > > 
> > > > Having independent limits for the family and type fields is a bit
> > > > easier to understand and document than imposing a limit on the
> > > > multiplication result.
> > 
> > This is a safer approach that can indeed be documented, but it looks
> > unintuitive and an arbitrary limitation.  Here is another proposal:
> > 
> > Let's ignore my previous suggestion to use bitfields for families and
> > protocols.  To keep it simple, we can get back to the initial struct but
> > specifically handle the (i64)-1 value (which cannot be a family,
> > protocol, nor a type):
> > {
> > 	.allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > 	.family = AF_INET,
> > 	.type = SOCK_STREAM,
> > 	.protocol = -1,
> > },
> > 
> > This would read: deny socket creation except for AF_INET with
> > SOCK_STREAM (and any protocol).
> > 
> > Users might need to add several rules (e.g. one for AF_INET and another
> > for AF_INET6) but that's OK.  Unfortunately we cannot identify a TCP
> > socket with only protocol = IPPROTO_TCP because protocol definitions
> > are relative to network families.  Specifying the protocol without the
> > family should then return an error.
> > 
> > Before rule could be loaded, users define how much they want to match a
> > socket: at least the family, optionally the type, and if the type is
> > also set then the protocol can also be set.  These dependencies are
> > required to transform this triplet to a key number, see below.
> > 
> > A landlock_ruleset_attr.handled_socket_layers field would define how
> > much we want to match a socket:
> > - 1: family only
> > - 2: family and type
> > - 3: family, type, and protocol
> > 
> > According to this ruleset's property, users will be allowed to fill the
> > family, type, or protocol fields in landlock_socket_attr rules.  If a
> > socket layer is not handled, it should contain (i64)-1 for the kernel to
> > detect misuse of the API.
> > 
> > This enables us to get a key from this triplet:
> > 
> > family_bits = 6; // 45 values for now
> > type_bits = 3; // 7 values for now
> > protocol_bits = 5; // 28 values for now
> > 
> > // attr.* are the sanitized UAPI values, including -1 replaced with 0.
> > // In this example, landlock_ruleset_attr.handled_socket_layers is 3, so
> > // the key is composed of all the 3 properties.
> > landlock_key.data = attr.family << (type_bits + protocol_bits) |
> >                      attr.type << protocol_bits | attr.protocol;
> > 
> > For each layer of restriction in a domain, we know how precise they
> > define a socket (i.e. with how many "socket layers").  We can then look
> > for at most 3 entries in the red-black tree: one with only the family,
> > another with the family and the type, and potentially a third also
> > including the protocol.  Each key would have the same significant bits
> > but with the lower bits masked according to each
> > landlock_ruleset_attr.handled_socket_layers .  Composing the related
> > access masks according to the defined socket layers, we can create an
> > array of struct access_masks for the request and then check if such
> > request is allowed by the current domain.  As for the currently stored
> > data, we can also identify the domain layer that blocked the request
> > (required for audit).
> 
> I do not quite understand why we need socket_layers. Without it,
> user can set (i64)(-1) to type or protocol whenever he wants. While
> transforming triplet to a key we can replace (i64)(-1) with some
> constant (e.g. (1 << type_bits - 1) for the type if type_bits = 8 and
> (1 << protocol_bits - 1) for the protocol if protocol_bits = 16).

We can indead add a virtual any/wildcard type and protocol, translated
from user space's (i64)-1 .  The downside is that for the same domain
layer we would need to check 4 different keys for the same triplet (i.e.
famility is always set, but type and protocol are optionals).  With the
socket_layers number, we only have to check for one key per domain
layer.  However, in the worse case with at least 3 domain layers having
different socket_layers value, we would still need to check for 3
different keys.  So, I think it's reasonable to get rid of the
socket_layers as you suggested (while requiring the family to always be
set).

> 
> > 
> > With this design, each sandbox can define a socket as much as it wants.
> > 
> > The downside is that we lost the bitfields and we need several calls to
> > filter more complex sockets (e.g. 4 for UDP and TCP with IPv4 and IPv6),
> > which looks OK compared to the required calls for filesystem access
> > control.
> > 
> > > > 
> > > > > > That being said, I am not a big fan of red-black trees for such simple
> > > > > > integer lookups either, and I also think there should be something
> > > > > > better if we make more use of the properties of the input ranges. The
> > > > > > question is though whether you want to couple that to this socket type
> > > > > > patch set, or rather do it in a follow up?  (So far we have been doing
> > > > > > fine with the red black trees, and we are already contemplating the
> > > > > > possibility of changing these internal structures in [2].  We have
> > > > > > also used RB trees for the "port" rules with a similar reasoning,
> > > > > > IIRC.)
> > > > > 
> > > > > I think it'll be better to have a separate series for [2] if the socket
> > > > > restriction can be implemented without rbtree refactoring.
> > > > 
> > > > Sounds good to me. 👍
> > > > 
> > > > [1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/
> > 
> > red-black trees are a good generic data structure for the current main
> > use case (for dynamic rulesets and static domains), but we'll need to
> > use more appropriate data structures.  I think this should not be a
> > blocker for this patch series.  It will be required to match (port)
> > ranges though (even if the use case seems limited), and in general for
> > better performances.
>
diff mbox series

Patch

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 2c8dbc74b955..d9da9f2c0640 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -44,6 +44,13 @@  struct landlock_ruleset_attr {
 	 * flags`_).
 	 */
 	__u64 handled_access_net;
+
+	/**
+	 * @handled_access_socket: Bitmask of actions (cf. `Socket flags`_)
+	 * that is handled by this ruleset and should then be forbidden if no
+	 * rule explicitly allow them.
+	 */
+	__u64 handled_access_socket;
 };
 
 /*
@@ -72,6 +79,11 @@  enum landlock_rule_type {
 	 * landlock_net_port_attr .
 	 */
 	LANDLOCK_RULE_NET_PORT,
+	/**
+	 * @LANDLOCK_RULE_SOCKET: Type of a &struct
+	 * landlock_socket_attr .
+	 */
+	LANDLOCK_RULE_SOCKET,
 };
 
 /**
@@ -123,6 +135,32 @@  struct landlock_net_port_attr {
 	__u64 port;
 };
 
+/**
+ * struct landlock_socket_attr - Socket definition
+ *
+ * Argument of sys_landlock_add_rule().
+ */
+struct landlock_socket_attr {
+	/**
+	 * @allowed_access: Bitmask of allowed access for a socket
+	 * (cf. `Socket flags`_).
+	 */
+	__u64 allowed_access;
+	/**
+	 * @family: Protocol family used for communication
+	 * (same as domain in socket(2)).
+	 *
+	 * This argument is considered valid if it is in the range [0, AF_MAX).
+	 */
+	int family;
+	/**
+	 * @type: Socket type (see socket(2)).
+	 *
+	 * This argument is considered valid if it is in the range [0, SOCK_MAX).
+	 */
+	int type;
+};
+
 /**
  * DOC: fs_access
  *
@@ -259,7 +297,7 @@  struct landlock_net_port_attr {
  * DOC: net_access
  *
  * Network flags
- * ~~~~~~~~~~~~~~~~
+ * ~~~~~~~~~~~~~
  *
  * These flags enable to restrict a sandboxed process to a set of network
  * actions. This is supported since the Landlock ABI version 4.
@@ -274,4 +312,25 @@  struct landlock_net_port_attr {
 #define LANDLOCK_ACCESS_NET_BIND_TCP			(1ULL << 0)
 #define LANDLOCK_ACCESS_NET_CONNECT_TCP			(1ULL << 1)
 /* clang-format on */
+
+/**
+ * DOC: socket_access
+ *
+ * Socket flags
+ * ~~~~~~~~~~~~
+ *
+ * These flags restrict actions on sockets for a sandboxed process (e.g. socket
+ * creation). Sockets opened before sandboxing are not subject to these
+ * restrictions. This is supported since the Landlock ABI version 6.
+ *
+ * The following access right apply only to sockets:
+ *
+ * - %LANDLOCK_ACCESS_SOCKET_CREATE: Create an user space socket. This access
+ *   right restricts following operations:
+ *   * :manpage:`socket(2)`, :manpage:`socketpair(2)`,
+ *   * ``IORING_OP_SOCKET`` io_uring operation (see :manpage:`io_uring_enter(2)`),
+ */
+/* clang-format off */
+#define LANDLOCK_ACCESS_SOCKET_CREATE			(1ULL << 0)
+/* clang-format on */
 #endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index b4538b7cf7d2..ff1dd98f6a1b 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,6 +1,6 @@ 
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
 landlock-y := setup.o syscalls.o object.o ruleset.o \
-	cred.o task.o fs.o
+	cred.o task.o fs.o socket.o
 
 landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index 4eb643077a2a..2c04dca414c7 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -26,6 +26,10 @@ 
 #define LANDLOCK_MASK_ACCESS_NET	((LANDLOCK_LAST_ACCESS_NET << 1) - 1)
 #define LANDLOCK_NUM_ACCESS_NET		__const_hweight64(LANDLOCK_MASK_ACCESS_NET)
 
+#define LANDLOCK_LAST_ACCESS_SOCKET	    LANDLOCK_ACCESS_SOCKET_CREATE
+#define LANDLOCK_MASK_ACCESS_SOCKET	    ((LANDLOCK_LAST_ACCESS_SOCKET << 1) - 1)
+#define LANDLOCK_NUM_ACCESS_SOCKET		__const_hweight64(LANDLOCK_MASK_ACCESS_SOCKET)
+
 /* clang-format on */
 
 #endif /* _SECURITY_LANDLOCK_LIMITS_H */
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 6ff232f58618..9bf5e5e88544 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -40,6 +40,7 @@  static struct landlock_ruleset *create_ruleset(const u32 num_layers)
 #if IS_ENABLED(CONFIG_INET)
 	new_ruleset->root_net_port = RB_ROOT;
 #endif /* IS_ENABLED(CONFIG_INET) */
+	new_ruleset->root_socket = RB_ROOT;
 
 	new_ruleset->num_layers = num_layers;
 	/*
@@ -52,12 +53,13 @@  static struct landlock_ruleset *create_ruleset(const u32 num_layers)
 
 struct landlock_ruleset *
 landlock_create_ruleset(const access_mask_t fs_access_mask,
-			const access_mask_t net_access_mask)
+			const access_mask_t net_access_mask,
+			const access_mask_t socket_access_mask)
 {
 	struct landlock_ruleset *new_ruleset;
 
 	/* Informs about useless ruleset. */
-	if (!fs_access_mask && !net_access_mask)
+	if (!fs_access_mask && !net_access_mask && !socket_access_mask)
 		return ERR_PTR(-ENOMSG);
 	new_ruleset = create_ruleset(1);
 	if (IS_ERR(new_ruleset))
@@ -66,6 +68,9 @@  landlock_create_ruleset(const access_mask_t fs_access_mask,
 		landlock_add_fs_access_mask(new_ruleset, fs_access_mask, 0);
 	if (net_access_mask)
 		landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
+	if (socket_access_mask)
+		landlock_add_socket_access_mask(new_ruleset, socket_access_mask,
+						0);
 	return new_ruleset;
 }
 
@@ -89,6 +94,9 @@  static bool is_object_pointer(const enum landlock_key_type key_type)
 		return false;
 #endif /* IS_ENABLED(CONFIG_INET) */
 
+	case LANDLOCK_KEY_SOCKET:
+		return false;
+
 	default:
 		WARN_ON_ONCE(1);
 		return false;
@@ -146,6 +154,9 @@  static struct rb_root *get_root(struct landlock_ruleset *const ruleset,
 		return &ruleset->root_net_port;
 #endif /* IS_ENABLED(CONFIG_INET) */
 
+	case LANDLOCK_KEY_SOCKET:
+		return &ruleset->root_socket;
+
 	default:
 		WARN_ON_ONCE(1);
 		return ERR_PTR(-EINVAL);
@@ -395,6 +406,11 @@  static int merge_ruleset(struct landlock_ruleset *const dst,
 		goto out_unlock;
 #endif /* IS_ENABLED(CONFIG_INET) */
 
+	/* Merges the @src socket tree. */
+	err = merge_tree(dst, src, LANDLOCK_KEY_SOCKET);
+	if (err)
+		goto out_unlock;
+
 out_unlock:
 	mutex_unlock(&src->lock);
 	mutex_unlock(&dst->lock);
@@ -458,6 +474,11 @@  static int inherit_ruleset(struct landlock_ruleset *const parent,
 		goto out_unlock;
 #endif /* IS_ENABLED(CONFIG_INET) */
 
+	/* Copies the @parent socket tree. */
+	err = inherit_tree(parent, child, LANDLOCK_KEY_SOCKET);
+	if (err)
+		goto out_unlock;
+
 	if (WARN_ON_ONCE(child->num_layers <= parent->num_layers)) {
 		err = -EINVAL;
 		goto out_unlock;
@@ -494,6 +515,10 @@  static void free_ruleset(struct landlock_ruleset *const ruleset)
 		free_rule(freeme, LANDLOCK_KEY_NET_PORT);
 #endif /* IS_ENABLED(CONFIG_INET) */
 
+	rbtree_postorder_for_each_entry_safe(freeme, next,
+					     &ruleset->root_socket, node)
+		free_rule(freeme, LANDLOCK_KEY_SOCKET);
+
 	put_hierarchy(ruleset->hierarchy);
 	kfree(ruleset);
 }
@@ -704,6 +729,10 @@  landlock_init_layer_masks(const struct landlock_ruleset *const domain,
 		break;
 #endif /* IS_ENABLED(CONFIG_INET) */
 
+	case LANDLOCK_KEY_SOCKET:
+		get_access_mask = landlock_get_socket_access_mask;
+		num_access = LANDLOCK_NUM_ACCESS_SOCKET;
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return 0;
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 0f1b5b4c8f6b..5cf7251e11ca 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -42,6 +42,7 @@  static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
 struct access_masks {
 	access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
 	access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
+	access_mask_t socket : LANDLOCK_NUM_ACCESS_SOCKET;
 };
 
 typedef u16 layer_mask_t;
@@ -92,6 +93,12 @@  enum landlock_key_type {
 	 * node keys.
 	 */
 	LANDLOCK_KEY_NET_PORT,
+
+	/**
+	 * @LANDLOCK_KEY_SOCKET: Type of &landlock_ruleset.root_socket's
+	 * node keys.
+	 */
+	LANDLOCK_KEY_SOCKET,
 };
 
 /**
@@ -177,6 +184,15 @@  struct landlock_ruleset {
 	struct rb_root root_net_port;
 #endif /* IS_ENABLED(CONFIG_INET) */
 
+	/**
+	 * @root_socket: Root of a red-black tree containing &struct
+	 * landlock_rule nodes with socket type, described by (family, type)
+	 * pair (see socket(2)). Once a ruleset is tied to a
+	 * process (i.e. as a domain), this tree is immutable until @usage
+	 * reaches zero.
+	 */
+	struct rb_root root_socket;
+
 	/**
 	 * @hierarchy: Enables hierarchy identification even when a parent
 	 * domain vanishes.  This is needed for the ptrace protection.
@@ -215,8 +231,10 @@  struct landlock_ruleset {
 			 */
 			u32 num_layers;
 			/**
-			 * @access_masks: Contains the subset of filesystem and
-			 * network actions that are restricted by a ruleset.
+			 * @access_masks: Contains the subset of filesystem,
+			 * network and socket actions that are restricted by
+			 * a ruleset.
+			 *
 			 * A domain saves all layers of merged rulesets in a
 			 * stack (FAM), starting from the first layer to the
 			 * last one.  These layers are used when merging
@@ -233,7 +251,8 @@  struct landlock_ruleset {
 
 struct landlock_ruleset *
 landlock_create_ruleset(const access_mask_t access_mask_fs,
-			const access_mask_t access_mask_net);
+			const access_mask_t access_mask_net,
+			const access_mask_t access_mask_socket);
 
 void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
 void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
@@ -280,6 +299,19 @@  landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
 	ruleset->access_masks[layer_level].net |= net_mask;
 }
 
+static inline void
+landlock_add_socket_access_mask(struct landlock_ruleset *const ruleset,
+				const access_mask_t socket_access_mask,
+				const u16 layer_level)
+{
+	access_mask_t socket_mask = socket_access_mask &
+				    LANDLOCK_MASK_ACCESS_SOCKET;
+
+	/* Should already be checked in sys_landlock_create_ruleset(). */
+	WARN_ON_ONCE(socket_access_mask != socket_mask);
+	ruleset->access_masks[layer_level].socket |= socket_mask;
+}
+
 static inline access_mask_t
 landlock_get_raw_fs_access_mask(const struct landlock_ruleset *const ruleset,
 				const u16 layer_level)
@@ -303,6 +335,13 @@  landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
 	return ruleset->access_masks[layer_level].net;
 }
 
+static inline access_mask_t
+landlock_get_socket_access_mask(const struct landlock_ruleset *const ruleset,
+				const u16 layer_level)
+{
+	return ruleset->access_masks[layer_level].socket;
+}
+
 bool landlock_unmask_layers(const struct landlock_rule *const rule,
 			    const access_mask_t access_request,
 			    layer_mask_t (*const layer_masks)[],
diff --git a/security/landlock/socket.c b/security/landlock/socket.c
new file mode 100644
index 000000000000..cad89bb91678
--- /dev/null
+++ b/security/landlock/socket.c
@@ -0,0 +1,69 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Socket management and hooks
+ *
+ * Copyright © 2024 Huawei Tech. Co., Ltd.
+ */
+
+#include <linux/net.h>
+#include <linux/socket.h>
+#include <linux/stddef.h>
+
+#include "limits.h"
+#include "ruleset.h"
+#include "socket.h"
+
+static uintptr_t pack_socket_key(const int family, const int type)
+{
+	union {
+		struct {
+			unsigned short family, type;
+		} __packed data;
+		unsigned int packed;
+	} socket_key;
+
+	/*
+	 * Checks that all supported socket families and types can be stored
+	 * in socket_key.
+	 */
+	BUILD_BUG_ON(AF_MAX >= (typeof(socket_key.data.family))~0);
+	BUILD_BUG_ON(SOCK_MAX >= (typeof(socket_key.data.type))~0);
+
+	/* Checks that socket_key can be stored in landlock_key. */
+	BUILD_BUG_ON(sizeof(socket_key.data) > sizeof(socket_key.packed));
+	BUILD_BUG_ON(sizeof(socket_key.packed) >
+		     sizeof_field(union landlock_key, data));
+
+	socket_key.data.family = (unsigned short)family;
+	socket_key.data.type = (unsigned short)type;
+
+	return socket_key.packed;
+}
+
+int landlock_append_socket_rule(struct landlock_ruleset *const ruleset,
+				int family, int type,
+				access_mask_t access_rights)
+{
+	int err;
+	/*
+	 * (AF_INET, SOCK_PACKET) is an alias for (AF_PACKET, SOCK_PACKET)
+	 * (Cf. __sock_create).
+	 */
+	if (family == AF_INET && type == SOCK_PACKET)
+		family = AF_PACKET;
+
+	const struct landlock_id id = {
+		.key.data = pack_socket_key(family, type),
+		.type = LANDLOCK_KEY_SOCKET,
+	};
+
+	/* Transforms relative access rights to absolute ones. */
+	access_rights |= LANDLOCK_MASK_ACCESS_SOCKET &
+			 ~landlock_get_socket_access_mask(ruleset, 0);
+
+	mutex_lock(&ruleset->lock);
+	err = landlock_insert_rule(ruleset, id, access_rights);
+	mutex_unlock(&ruleset->lock);
+
+	return err;
+}
diff --git a/security/landlock/socket.h b/security/landlock/socket.h
new file mode 100644
index 000000000000..8519357f1c39
--- /dev/null
+++ b/security/landlock/socket.h
@@ -0,0 +1,17 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Socket management and hooks
+ *
+ * Copyright © 2024 Huawei Tech. Co., Ltd.
+ */
+
+#ifndef _SECURITY_LANDLOCK_SOCKET_H
+#define _SECURITY_LANDLOCK_SOCKET_H
+
+#include "ruleset.h"
+
+int landlock_append_socket_rule(struct landlock_ruleset *const ruleset,
+				const int family, const int type,
+				access_mask_t access_rights);
+
+#endif /* _SECURITY_LANDLOCK_SOCKET_H */
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index ccc8bc6c1584..026033e4ecb6 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -24,12 +24,14 @@ 
 #include <linux/syscalls.h>
 #include <linux/types.h>
 #include <linux/uaccess.h>
+#include <linux/net.h>
 #include <uapi/linux/landlock.h>
 
 #include "cred.h"
 #include "fs.h"
 #include "limits.h"
 #include "net.h"
+#include "socket.h"
 #include "ruleset.h"
 #include "setup.h"
 
@@ -88,7 +90,8 @@  static void build_check_abi(void)
 	struct landlock_ruleset_attr ruleset_attr;
 	struct landlock_path_beneath_attr path_beneath_attr;
 	struct landlock_net_port_attr net_port_attr;
-	size_t ruleset_size, path_beneath_size, net_port_size;
+	struct landlock_socket_attr socket_attr;
+	size_t ruleset_size, path_beneath_size, net_port_size, socket_size;
 
 	/*
 	 * For each user space ABI structures, first checks that there is no
@@ -97,8 +100,9 @@  static void build_check_abi(void)
 	 */
 	ruleset_size = sizeof(ruleset_attr.handled_access_fs);
 	ruleset_size += sizeof(ruleset_attr.handled_access_net);
+	ruleset_size += sizeof(ruleset_attr.handled_access_socket);
 	BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size);
-	BUILD_BUG_ON(sizeof(ruleset_attr) != 16);
+	BUILD_BUG_ON(sizeof(ruleset_attr) != 24);
 
 	path_beneath_size = sizeof(path_beneath_attr.allowed_access);
 	path_beneath_size += sizeof(path_beneath_attr.parent_fd);
@@ -109,6 +113,12 @@  static void build_check_abi(void)
 	net_port_size += sizeof(net_port_attr.port);
 	BUILD_BUG_ON(sizeof(net_port_attr) != net_port_size);
 	BUILD_BUG_ON(sizeof(net_port_attr) != 16);
+
+	socket_size = sizeof(socket_attr.allowed_access);
+	socket_size += sizeof(socket_attr.family);
+	socket_size += sizeof(socket_attr.type);
+	BUILD_BUG_ON(sizeof(socket_attr) != socket_size);
+	BUILD_BUG_ON(sizeof(socket_attr) != 16);
 }
 
 /* Ruleset handling */
@@ -149,7 +159,7 @@  static const struct file_operations ruleset_fops = {
 	.write = fop_dummy_write,
 };
 
-#define LANDLOCK_ABI_VERSION 5
+#define LANDLOCK_ABI_VERSION 6
 
 /**
  * sys_landlock_create_ruleset - Create a new ruleset
@@ -213,9 +223,15 @@  SYSCALL_DEFINE3(landlock_create_ruleset,
 	    LANDLOCK_MASK_ACCESS_NET)
 		return -EINVAL;
 
+	/* Checks socket content (and 32-bits cast). */
+	if ((ruleset_attr.handled_access_socket |
+	     LANDLOCK_MASK_ACCESS_SOCKET) != LANDLOCK_MASK_ACCESS_SOCKET)
+		return -EINVAL;
+
 	/* Checks arguments and transforms to kernel struct. */
 	ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
-					  ruleset_attr.handled_access_net);
+					  ruleset_attr.handled_access_net,
+					  ruleset_attr.handled_access_socket);
 	if (IS_ERR(ruleset))
 		return PTR_ERR(ruleset);
 
@@ -371,6 +387,45 @@  static int add_rule_net_port(struct landlock_ruleset *ruleset,
 					net_port_attr.allowed_access);
 }
 
+static int add_rule_socket(struct landlock_ruleset *ruleset,
+			   const void __user *const rule_attr)
+{
+	struct landlock_socket_attr socket_attr;
+	int family, type;
+	int res;
+	access_mask_t mask;
+
+	/* Copies raw user space buffer. */
+	res = copy_from_user(&socket_attr, rule_attr, sizeof(socket_attr));
+	if (res)
+		return -EFAULT;
+
+	/*
+	 * Informs about useless rule: empty allowed_access (i.e. deny rules)
+	 * are ignored by socket actions.
+	 */
+	if (!socket_attr.allowed_access)
+		return -ENOMSG;
+
+	/* Checks that allowed_access matches the @ruleset constraints. */
+	mask = landlock_get_socket_access_mask(ruleset, 0);
+	if ((socket_attr.allowed_access | mask) != mask)
+		return -EINVAL;
+
+	family = socket_attr.family;
+	type = socket_attr.type;
+
+	/* Denies inserting a rule with family and type outside the range. */
+	if (family < 0 || family >= AF_MAX)
+		return -EINVAL;
+	if (type < 0 || type >= SOCK_MAX)
+		return -EINVAL;
+
+	/* Imports the new rule. */
+	return landlock_append_socket_rule(ruleset, family, type,
+					   socket_attr.allowed_access);
+}
+
 /**
  * sys_landlock_add_rule - Add a new rule to a ruleset
  *
@@ -430,6 +485,9 @@  SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
 	case LANDLOCK_RULE_NET_PORT:
 		err = add_rule_net_port(ruleset, rule_attr);
 		break;
+	case LANDLOCK_RULE_SOCKET:
+		err = add_rule_socket(ruleset, rule_attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 3b26bf3cf5b9..1bc16fde2e8a 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -76,7 +76,7 @@  TEST(abi_version)
 	const struct landlock_ruleset_attr ruleset_attr = {
 		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
 	};
-	ASSERT_EQ(5, landlock_create_ruleset(NULL, 0,
+	ASSERT_EQ(6, landlock_create_ruleset(NULL, 0,
 					     LANDLOCK_CREATE_RULESET_VERSION));
 
 	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,