Message ID | 20240311141454.31537-2-tianquan23@gmail.com |
---|---|
State | Deferred |
Headers | show |
Series | [v3,nf-next,1/2] netfilter: nf_tables: use struct nlattr * to store userdata for nft_table | expand |
Quan Tian <tianquan23@gmail.com> wrote: > The NFTA_TABLE_USERDATA attribute was ignored on updates. The patch adds > handling for it to support table comment updates. One generic API question below. Pablo, please look at this too. > case NFT_MSG_NEWTABLE: > if (nft_trans_table_update(trans)) { > - if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { > - nft_trans_destroy(trans); > - break; > + if (trans->ctx.table->flags & __NFT_TABLE_F_UPDATE) { > + if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) > + nf_tables_table_disable(net, trans->ctx.table); > + trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > } > - if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) > - nf_tables_table_disable(net, trans->ctx.table); > - > - trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > + swap(trans->ctx.table->udata, nft_trans_table_udata(trans)); > + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE AFAICS this means that if the table as udata attached, and userspace makes an update request without a UDATA netlink attribute, we will delete the existing udata. Is that right? My question is, should we instead leave the existing udata as-is and not support removal, only replace?
On Tue, Mar 12, 2024 at 01:27:58PM +0100, Florian Westphal wrote: > Quan Tian <tianquan23@gmail.com> wrote: > > The NFTA_TABLE_USERDATA attribute was ignored on updates. The patch adds > > handling for it to support table comment updates. > > One generic API question below. Pablo, please look at this too. > > > case NFT_MSG_NEWTABLE: > > if (nft_trans_table_update(trans)) { > > - if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { > > - nft_trans_destroy(trans); > > - break; > > + if (trans->ctx.table->flags & __NFT_TABLE_F_UPDATE) { > > + if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) > > + nf_tables_table_disable(net, trans->ctx.table); > > + trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > > } > > - if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) > > - nf_tables_table_disable(net, trans->ctx.table); > > - > > - trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > > + swap(trans->ctx.table->udata, nft_trans_table_udata(trans)); > > + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE > > AFAICS this means that if the table as udata attached, and userspace > makes an update request without a UDATA netlink attribute, we will > delete the existing udata. > > Is that right? > > My question is, should we instead leave the existing udata as-is and not > support removal, only replace? I would leave it in place too if no _USERDATA is specified. One more question is if the memcmp() with old and new udata makes sense considering two consecutive requests for _USERDATA update in one batch.
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > AFAICS this means that if the table as udata attached, and userspace > > makes an update request without a UDATA netlink attribute, we will > > delete the existing udata. > > > > Is that right? > > > > My question is, should we instead leave the existing udata as-is and not > > support removal, only replace? > > I would leave it in place too if no _USERDATA is specified. > > One more question is if the memcmp() with old and new udata makes > sense considering two consecutive requests for _USERDATA update in one > batch. Great point, any second udata change request in the same batch must fail. We learned this the hard way with flag updates :(
On Mon, Mar 11, 2024 at 10:14:54PM +0800, Quan Tian wrote: > The NFTA_TABLE_USERDATA attribute was ignored on updates. The patch adds > handling for it to support table comment updates. dump path is lockless: if (table->udata) { if (nla_put(skb, NFTA_TABLE_USERDATA, table->udlen, table->udata)) goto nla_put_failure; } there are two things to update at the same time here, table->udata and table->udlen. This needs to be reworked fully if updates are required. nft_userdata { const u8 *data; u16 len; } then, update struct nft_table to have: struct nft_userdata __rcu *user; then, from dump path, rcu_dereference struct nft_userdata pointer. BTW, does swap() ensure rcu semantics? > Signed-off-by: Quan Tian <tianquan23@gmail.com> > --- > v2: Change to store userdata in struct nlattr * to ensure atomical update > v3: Do not call nft_trans_destroy() on table updates in nf_tables_commit() > > include/net/netfilter/nf_tables.h | 3 ++ > net/netfilter/nf_tables_api.c | 56 ++++++++++++++++++++++--------- > 2 files changed, 43 insertions(+), 16 deletions(-) > > diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h > index 144dc469ebf8..43c747bd3f80 100644 > --- a/include/net/netfilter/nf_tables.h > +++ b/include/net/netfilter/nf_tables.h > @@ -1684,10 +1684,13 @@ struct nft_trans_chain { > > struct nft_trans_table { > bool update; > + struct nlattr *udata; > }; > > #define nft_trans_table_update(trans) \ > (((struct nft_trans_table *)trans->data)->update) > +#define nft_trans_table_udata(trans) \ > + (((struct nft_trans_table *)trans->data)->udata) > > struct nft_trans_elem { > struct nft_set *set; > diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c > index 85088297dd0d..62a2a1798052 100644 > --- a/net/netfilter/nf_tables_api.c > +++ b/net/netfilter/nf_tables_api.c > @@ -188,6 +188,17 @@ static struct nlattr *nft_userdata_dup(const struct nlattr *udata, gfp_t gfp) > return kmemdup(udata, nla_total_size(nla_len(udata)), gfp); > } > > +static bool nft_userdata_is_same(const struct nlattr *a, const struct nlattr *b) > +{ > + if (!a && !b) > + return true; > + if (!a || !b) > + return false; > + if (nla_len(a) != nla_len(b)) > + return false; > + return !memcmp(nla_data(a), nla_data(b), nla_len(a)); > +} > + > static void __nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set, > bool bind) > { > @@ -1210,16 +1221,16 @@ static int nf_tables_updtable(struct nft_ctx *ctx) > { > struct nft_trans *trans; > u32 flags; > + const struct nlattr *udata = ctx->nla[NFTA_TABLE_USERDATA]; > int ret; > > - if (!ctx->nla[NFTA_TABLE_FLAGS]) > - return 0; > - > - flags = ntohl(nla_get_be32(ctx->nla[NFTA_TABLE_FLAGS])); > - if (flags & ~NFT_TABLE_F_MASK) > - return -EOPNOTSUPP; > + if (ctx->nla[NFTA_TABLE_FLAGS]) { > + flags = ntohl(nla_get_be32(ctx->nla[NFTA_TABLE_FLAGS])); > + if (flags & ~NFT_TABLE_F_MASK) > + return -EOPNOTSUPP; > + } > > - if (flags == ctx->table->flags) > + if (flags == ctx->table->flags && nft_userdata_is_same(udata, ctx->table->udata)) > return 0; > > if ((nft_table_has_owner(ctx->table) && > @@ -1240,6 +1251,14 @@ static int nf_tables_updtable(struct nft_ctx *ctx) > if (trans == NULL) > return -ENOMEM; > > + if (udata) { > + nft_trans_table_udata(trans) = nft_userdata_dup(udata, GFP_KERNEL_ACCOUNT); > + if (!nft_trans_table_udata(trans)) { > + ret = -ENOMEM; > + goto err_table_udata; > + } > + } > + > if ((flags & NFT_TABLE_F_DORMANT) && > !(ctx->table->flags & NFT_TABLE_F_DORMANT)) { > ctx->table->flags |= NFT_TABLE_F_DORMANT; > @@ -1271,6 +1290,8 @@ static int nf_tables_updtable(struct nft_ctx *ctx) > > err_register_hooks: > ctx->table->flags |= NFT_TABLE_F_DORMANT; > + kfree(nft_trans_table_udata(trans)); > +err_table_udata: > nft_trans_destroy(trans); > return ret; > } > @@ -9378,6 +9399,9 @@ static void nft_obj_commit_update(struct nft_trans *trans) > static void nft_commit_release(struct nft_trans *trans) > { > switch (trans->msg_type) { > + case NFT_MSG_NEWTABLE: > + kfree(nft_trans_table_udata(trans)); > + break; > case NFT_MSG_DELTABLE: > case NFT_MSG_DESTROYTABLE: > nf_tables_table_destroy(&trans->ctx); > @@ -10140,19 +10164,18 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) > switch (trans->msg_type) { > case NFT_MSG_NEWTABLE: > if (nft_trans_table_update(trans)) { > - if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { > - nft_trans_destroy(trans); > - break; > + if (trans->ctx.table->flags & __NFT_TABLE_F_UPDATE) { > + if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) > + nf_tables_table_disable(net, trans->ctx.table); > + trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > } > - if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) > - nf_tables_table_disable(net, trans->ctx.table); > - > - trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > + swap(trans->ctx.table->udata, nft_trans_table_udata(trans)); > + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); > } else { > nft_clear(net, trans->ctx.table); > + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); > + nft_trans_destroy(trans); > } > - nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); > - nft_trans_destroy(trans); > break; > case NFT_MSG_DELTABLE: > case NFT_MSG_DESTROYTABLE: > @@ -10430,6 +10453,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) > switch (trans->msg_type) { > case NFT_MSG_NEWTABLE: > if (nft_trans_table_update(trans)) { > + kfree(nft_trans_table_udata(trans)); > if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { > nft_trans_destroy(trans); > break; > -- > 2.39.3 (Apple Git-145) >
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Mon, Mar 11, 2024 at 10:14:54PM +0800, Quan Tian wrote: > > The NFTA_TABLE_USERDATA attribute was ignored on updates. The patch adds > > handling for it to support table comment updates. > > dump path is lockless: > > if (table->udata) { > if (nla_put(skb, NFTA_TABLE_USERDATA, table->udlen, table->udata)) > goto nla_put_failure; > } > > there are two things to update at the same time here, table->udata and > table->udlen. > > This needs to be reworked fully if updates are required. See first patch in the series, it makes this a single pointer, but you are right... > then, update struct nft_table to have: > > struct nft_userdata __rcu *user; .. this needs an __rcu annotation. I'll respond to patch 1 too. > BTW, does swap() ensure rcu semantics? No, this needs to use rcu_replace_pointer() and manual update of the old one stored in the transaction update.
Quan Tian <tianquan23@gmail.com> wrote: > - trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > + swap(trans->ctx.table->udata, nft_trans_table_udata(trans)); > + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); I missed this in my review, as Pablo pointed out you can't use swap() here, because table->udata is rcu protected. Something like this should work as replacement: nft_trans_table_udata(trans) = rcu_replace_pointer(trans->ctx.table->udata, nft_trans_table_udata(trans), lockdep_commit_lock_is_held(trans->ctx.net)); This will swap and ensure all stores are visible.
On Tue, Mar 12, 2024 at 02:01:34PM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > AFAICS this means that if the table as udata attached, and userspace > > > makes an update request without a UDATA netlink attribute, we will > > > delete the existing udata. > > > > > > Is that right? > > > > > > My question is, should we instead leave the existing udata as-is and not > > > support removal, only replace? > > > > I would leave it in place too if no _USERDATA is specified. > > Sure, I will change it in the proposed way. > > One more question is if the memcmp() with old and new udata makes > > sense considering two consecutive requests for _USERDATA update in one > > batch. > > Great point, any second udata change request in the same batch must fail. > > We learned this the hard way with flag updates :( Is it the same as two consectutive requests for chain name update and chain stats update? In nf_tables_commit(): The 1st trans swaps old udata with 1st new udata; The 2nd trans swaps 1st new udata with 2nd new udata. In nft_commit_release(): The 1st trans frees old udata; The 2nd trans frees 1st new udata. So multiple udata requests in a batch could work? Thanks, Quan
On Tue, Mar 12, 2024 at 03:10:46PM +0100, Florian Westphal wrote: > Quan Tian <tianquan23@gmail.com> wrote: > > - trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; > > + swap(trans->ctx.table->udata, nft_trans_table_udata(trans)); > > + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); > > I missed this in my review, as Pablo pointed out you can't use swap() > here, because table->udata is rcu protected. > > Something like this should work as replacement: > > nft_trans_table_udata(trans) = rcu_replace_pointer(trans->ctx.table->udata, > nft_trans_table_udata(trans), > lockdep_commit_lock_is_held(trans->ctx.net)); > > This will swap and ensure all stores are visible. Thank you Pablo and Florian for pointing it out. I will fix it following your suggestions. Thanks, Quan
Quan Tian <tianquan23@gmail.com> wrote: > In nf_tables_commit(): > The 1st trans swaps old udata with 1st new udata; > The 2nd trans swaps 1st new udata with 2nd new udata. > > In nft_commit_release(): > The 1st trans frees old udata; > The 2nd trans frees 1st new udata. > > So multiple udata requests in a batch could work? Yes, it could work indeed but we got bitten by subtle bugs with back-to-back updates. If there is a simple way to detect and reject this then I believe its better to disallow it. Unless you come up with a use-case where such back-to-back udate updates make sense of course.
On Tue, Mar 12, 2024 at 10:26:17PM +0800, Quan Tian wrote: > On Tue, Mar 12, 2024 at 02:01:34PM +0100, Florian Westphal wrote: > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > > AFAICS this means that if the table as udata attached, and userspace > > > > makes an update request without a UDATA netlink attribute, we will > > > > delete the existing udata. > > > > > > > > Is that right? > > > > > > > > My question is, should we instead leave the existing udata as-is and not > > > > support removal, only replace? > > > > > > I would leave it in place too if no _USERDATA is specified. > > > > > Sure, I will change it in the proposed way. > > > > One more question is if the memcmp() with old and new udata makes > > > sense considering two consecutive requests for _USERDATA update in one > > > batch. > > > > Great point, any second udata change request in the same batch must fail. > > > > We learned this the hard way with flag updates :( > > Is it the same as two consectutive requests for chain name update and > chain stats update? > > In nf_tables_commit(): > The 1st trans swaps old udata with 1st new udata; > The 2nd trans swaps 1st new udata with 2nd new udata. > > In nft_commit_release(): > The 1st trans frees old udata; > The 2nd trans frees 1st new udata. And abort path simply releases the transaction objects, since udata was left intact in place. > So multiple udata requests in a batch could work? Yes, if rcu is used correctly, that should work fine. But memcmp() needs to be removed.
On Tue, Mar 12, 2024 at 03:33:00PM +0100, Florian Westphal wrote: > Quan Tian <tianquan23@gmail.com> wrote: > > In nf_tables_commit(): > > The 1st trans swaps old udata with 1st new udata; > > The 2nd trans swaps 1st new udata with 2nd new udata. > > > > In nft_commit_release(): > > The 1st trans frees old udata; > > The 2nd trans frees 1st new udata. > > > > So multiple udata requests in a batch could work? > > Yes, it could work indeed but we got bitten by > subtle bugs with back-to-back updates. yes, we have seen subtle bugs recently. As for the table flags, the dormant flag has been particularly a problem, it is tricky one because it registers hooks from preparation step (which might fail) but it cannot register hooks because abort path might need undo things, and re-register the hooks could lead to failures from a later path which does not admit failures. For the dormant flag, another possibility would be to handle this from the core, so there is no need to register and unregister hooks, instead simply set them as inactive. The dormant flag use case is rather corner case, but hardware offload will require sooner or later a mode to run in _hardware mode only_ (currently it is both hardware and software for nftables) and considering the hardware offload API has been designed for packet classifiers from the late 90s (that is, very strictly express your policy in a linear ruleset) that means dropping packets early is fine, but wanted traffic gets evaluated in a linear list. > If there is a simple way to detect and reject > this then I believe its better to disallow it. That requires to iterate over the list of transaction, or add some kind of flag to reject this. > Unless you come up with a use-case where such back-to-back > udate updates make sense of course. I don't have a use-case for this table userdata myself, this is currently only used to store comments by userspace, why someone would be willing to update such comment associated to a table, I don't know. I would like to know if there are plans to submit similar patches for other objects. As for sets, this needs to be careful because userdata contains the set description.
On Tue, Mar 12, 2024 at 11:23:56PM +0100, Pablo Neira Ayuso wrote: > On Tue, Mar 12, 2024 at 03:33:00PM +0100, Florian Westphal wrote: > > Quan Tian <tianquan23@gmail.com> wrote: > > > In nf_tables_commit(): > > > The 1st trans swaps old udata with 1st new udata; > > > The 2nd trans swaps 1st new udata with 2nd new udata. > > > > > > In nft_commit_release(): > > > The 1st trans frees old udata; > > > The 2nd trans frees 1st new udata. > > > > > > So multiple udata requests in a batch could work? > > > > Yes, it could work indeed but we got bitten by > > subtle bugs with back-to-back updates. > > yes, we have seen subtle bugs recently. As for the table flags, the > dormant flag has been particularly a problem, it is tricky one because > it registers hooks from preparation step (which might fail) but it > cannot register hooks because abort path might need undo things, and > re-register the hooks could lead to failures from a later path which > does not admit failures. For the dormant flag, another possibility > would be to handle this from the core, so there is no need to register > and unregister hooks, instead simply set them as inactive. > > The dormant flag use case is rather corner case, but hardware offload > will require sooner or later a mode to run in _hardware mode only_ > (currently it is both hardware and software for nftables) and > considering the hardware offload API has been designed for packet > classifiers from the late 90s (that is, very strictly express your > policy in a linear ruleset) that means dropping packets early is fine, > but wanted traffic gets evaluated in a linear list. > > > If there is a simple way to detect and reject > > this then I believe its better to disallow it. > > That requires to iterate over the list of transaction, or add some > kind of flag to reject this. > > > Unless you come up with a use-case where such back-to-back > > udate updates make sense of course. > > I don't have a use-case for this table userdata myself, this is > currently only used to store comments by userspace, why someone would > be willing to update such comment associated to a table, I don't know. There was a use-case in kube-proxy that we wanted to use comment to store version/compatibility information so we could know whether it has to recreate the whole table when upgrading to a new version due to incompatible chain/rule changes (e.g. a chain's hook and priority is changed). The reason why we wanted to avoid recreating the whole table when it doesn't have to is because deleting the table would also destory the dynamic sets in the table, losing some data. Another minor reason is that the comments could be used to store human-readable explanation for the objects. And they may be change when the objects' functions change. It would be great if they could be updated via the "add" operation, otherwise "delete+add" would always be needed to keep comments up-to-date. However, I don't know a use-case for back-to-back comment updates, I will check which one is more complex and risky: rejecting it or supporting it. > I would like to know if there are plans to submit similar patches for > other objects. As for sets, this needs to be careful because userdata > contains the set description. As explained above, I think there is some value in supporting comment update for other objects, especially when the objects contain dynamic data. But I understand there are risks like you mentioned and it would need to be more careful than tables. I would volunteer to take a try for other objects, starting with objects similiar to tables first, if it doesn’t sound bad to you. Please let me know if you feel it's not worth. Thanks, Quan
On Wed, Mar 13, 2024 at 09:35:10AM +0800, Quan Tian wrote: > On Tue, Mar 12, 2024 at 11:23:56PM +0100, Pablo Neira Ayuso wrote: [...] > > I don't have a use-case for this table userdata myself, this is > > currently only used to store comments by userspace, why someone would > > be willing to update such comment associated to a table, I don't know. > > There was a use-case in kube-proxy that we wanted to use comment to > store version/compatibility information so we could know whether it > has to recreate the whole table when upgrading to a new version due to > incompatible chain/rule changes (e.g. a chain's hook and priority is > changed). The reason why we wanted to avoid recreating the whole table > when it doesn't have to is because deleting the table would also > destory the dynamic sets in the table, losing some data. There is a generation number which gets bumped for each ruleset update which is currently global. Would having such generation ID per table help or you need more flexibility in what needs to be stored in the userdata area? > Another minor reason is that the comments could be used to store > human-readable explanation for the objects. And they may be change when > the objects' functions change. It would be great if they could be > updated via the "add" operation, otherwise "delete+add" would always be > needed to keep comments up-to-date. > > However, I don't know a use-case for back-to-back comment updates, I > will check which one is more complex and risky: rejecting it or > supporting it. We have these back-to-back update everywhere. The original idea was to allow for robots to place all commands in a batch and send it to the kernel, if the robot adds objects and then, while the same batch is being collected, another update on such object happens, including one that cancels the object that was just added, this is accepted since this nonsense is valid according to the transactional model. In this transactional model, batch processing also continues on errors, rather than stopping on the first error, error unwinding is special in that sense because objects remain in place and are just marked are deleted. This allows for getting all errors to userspace at once. It is harder than classic netdev interfaces in the kernel. Florian has concerns on this transactional model as too hard because of the recent bug reports. If the direction is to tigthen this, then I think all should be revised, not just reject this userdata update while everything else still allows for back-to-back updates. This is of course not related to your patch. > > I would like to know if there are plans to submit similar patches for > > other objects. As for sets, this needs to be careful because userdata > > contains the set description. > > As explained above, I think there is some value in supporting comment > update for other objects, especially when the objects contain dynamic > data. But I understand there are risks like you mentioned and it would > need to be more careful than tables. I would volunteer to take a try > for other objects, starting with objects similiar to tables first, if > it doesn’t sound bad to you. Please let me know if you feel it's not > worth. I am not saying it is not worth, and I am trying to understand your requirements in case there is a chance to provide an alternative path to using _USERDATA for this. Thanks for explaining.
On Wed, Mar 13, 2024 at 10:40:05AM +0100, Pablo Neira Ayuso wrote: > On Wed, Mar 13, 2024 at 09:35:10AM +0800, Quan Tian wrote: > > On Tue, Mar 12, 2024 at 11:23:56PM +0100, Pablo Neira Ayuso wrote: > [...] > > > I don't have a use-case for this table userdata myself, this is > > > currently only used to store comments by userspace, why someone would > > > be willing to update such comment associated to a table, I don't know. > > > > There was a use-case in kube-proxy that we wanted to use comment to > > store version/compatibility information so we could know whether it > > has to recreate the whole table when upgrading to a new version due to > > incompatible chain/rule changes (e.g. a chain's hook and priority is > > changed). The reason why we wanted to avoid recreating the whole table > > when it doesn't have to is because deleting the table would also > > destory the dynamic sets in the table, losing some data. > > There is a generation number which gets bumped for each ruleset > update which is currently global. > > Would having such generation ID per table help or you need more > flexibility in what needs to be stored in the userdata area? Auto-increased generation ID may not meet the above requirement as the table could also be frequently updated by configuration changes at runtime, not only after the application upgrades. An example scenario could be: say we have a loadbalancer application based on nftables: * LB v0.1.0 installs a collection of nftable rules; * LB v0.1.1 makes some changes compatible with v0.1.0. In upgrade case, it doesn't need to recreate the whole table if the existing rules were installed by version >= 0.1.0; * LB v0.2.0 makes some changes incompatible with v0.1.x (e.g. some chains' priorities are changed), it needs to recreate the whole table when upgrading from v0.1.x to it; * LB v0.2.1 makes some changes compatible with v0.2.0, it doesn't need to recreate the table when upgrading from v0.2.0 to it but needs to recreate it when upgrading from v0.1.x to it. With the support for comment updates, we can store, get, and update such version information in comment, and use it to determine when it's necessary to recreate the table. >> If there is a simple way to detect and reject >> this then I believe its better to disallow it. > > That requires to iterate over the list of transaction, or add some > kind of flag to reject this. I tried to detect back-to-back userdata comment updates, but found it's more complex than I thought. A flag may not work because it can only tell whether the 1st update touched comment and we don't know whether the 2nd update is the same as the 1st one, unless we iterate over the list of transaction, which looks a bit overkill? It seems simpler if we just allow it and accept the last userdata. >> My question is, should we instead leave the existing udata as-is and not >> support removal, only replace? > >I would leave it in place too if no _USERDATA is specified. I got a question when trying to implementing this. If the update request specifies USERDATA but its length is 0, do we treat it like not specified, or remove the existing userdata? Asking because I see nf_tables_newrule() treats 0 length in the same way as unspecified and doesn't initialize a nft_userdata. It makes sense for create, but I'm not sure if it's right to do it in the same way for update. And if we want to treat it as "remove comment", we can't simply add a single pointer of nft_userdata to nft_trans_table to achieve it, because there would be 3 cases: leave it in place, remove it, update it. Thanks, Quan
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 144dc469ebf8..43c747bd3f80 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1684,10 +1684,13 @@ struct nft_trans_chain { struct nft_trans_table { bool update; + struct nlattr *udata; }; #define nft_trans_table_update(trans) \ (((struct nft_trans_table *)trans->data)->update) +#define nft_trans_table_udata(trans) \ + (((struct nft_trans_table *)trans->data)->udata) struct nft_trans_elem { struct nft_set *set; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 85088297dd0d..62a2a1798052 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -188,6 +188,17 @@ static struct nlattr *nft_userdata_dup(const struct nlattr *udata, gfp_t gfp) return kmemdup(udata, nla_total_size(nla_len(udata)), gfp); } +static bool nft_userdata_is_same(const struct nlattr *a, const struct nlattr *b) +{ + if (!a && !b) + return true; + if (!a || !b) + return false; + if (nla_len(a) != nla_len(b)) + return false; + return !memcmp(nla_data(a), nla_data(b), nla_len(a)); +} + static void __nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set, bool bind) { @@ -1210,16 +1221,16 @@ static int nf_tables_updtable(struct nft_ctx *ctx) { struct nft_trans *trans; u32 flags; + const struct nlattr *udata = ctx->nla[NFTA_TABLE_USERDATA]; int ret; - if (!ctx->nla[NFTA_TABLE_FLAGS]) - return 0; - - flags = ntohl(nla_get_be32(ctx->nla[NFTA_TABLE_FLAGS])); - if (flags & ~NFT_TABLE_F_MASK) - return -EOPNOTSUPP; + if (ctx->nla[NFTA_TABLE_FLAGS]) { + flags = ntohl(nla_get_be32(ctx->nla[NFTA_TABLE_FLAGS])); + if (flags & ~NFT_TABLE_F_MASK) + return -EOPNOTSUPP; + } - if (flags == ctx->table->flags) + if (flags == ctx->table->flags && nft_userdata_is_same(udata, ctx->table->udata)) return 0; if ((nft_table_has_owner(ctx->table) && @@ -1240,6 +1251,14 @@ static int nf_tables_updtable(struct nft_ctx *ctx) if (trans == NULL) return -ENOMEM; + if (udata) { + nft_trans_table_udata(trans) = nft_userdata_dup(udata, GFP_KERNEL_ACCOUNT); + if (!nft_trans_table_udata(trans)) { + ret = -ENOMEM; + goto err_table_udata; + } + } + if ((flags & NFT_TABLE_F_DORMANT) && !(ctx->table->flags & NFT_TABLE_F_DORMANT)) { ctx->table->flags |= NFT_TABLE_F_DORMANT; @@ -1271,6 +1290,8 @@ static int nf_tables_updtable(struct nft_ctx *ctx) err_register_hooks: ctx->table->flags |= NFT_TABLE_F_DORMANT; + kfree(nft_trans_table_udata(trans)); +err_table_udata: nft_trans_destroy(trans); return ret; } @@ -9378,6 +9399,9 @@ static void nft_obj_commit_update(struct nft_trans *trans) static void nft_commit_release(struct nft_trans *trans) { switch (trans->msg_type) { + case NFT_MSG_NEWTABLE: + kfree(nft_trans_table_udata(trans)); + break; case NFT_MSG_DELTABLE: case NFT_MSG_DESTROYTABLE: nf_tables_table_destroy(&trans->ctx); @@ -10140,19 +10164,18 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) switch (trans->msg_type) { case NFT_MSG_NEWTABLE: if (nft_trans_table_update(trans)) { - if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { - nft_trans_destroy(trans); - break; + if (trans->ctx.table->flags & __NFT_TABLE_F_UPDATE) { + if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) + nf_tables_table_disable(net, trans->ctx.table); + trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; } - if (trans->ctx.table->flags & NFT_TABLE_F_DORMANT) - nf_tables_table_disable(net, trans->ctx.table); - - trans->ctx.table->flags &= ~__NFT_TABLE_F_UPDATE; + swap(trans->ctx.table->udata, nft_trans_table_udata(trans)); + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); } else { nft_clear(net, trans->ctx.table); + nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); + nft_trans_destroy(trans); } - nf_tables_table_notify(&trans->ctx, NFT_MSG_NEWTABLE); - nft_trans_destroy(trans); break; case NFT_MSG_DELTABLE: case NFT_MSG_DESTROYTABLE: @@ -10430,6 +10453,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) switch (trans->msg_type) { case NFT_MSG_NEWTABLE: if (nft_trans_table_update(trans)) { + kfree(nft_trans_table_udata(trans)); if (!(trans->ctx.table->flags & __NFT_TABLE_F_UPDATE)) { nft_trans_destroy(trans); break;
The NFTA_TABLE_USERDATA attribute was ignored on updates. The patch adds handling for it to support table comment updates. Signed-off-by: Quan Tian <tianquan23@gmail.com> --- v2: Change to store userdata in struct nlattr * to ensure atomical update v3: Do not call nft_trans_destroy() on table updates in nf_tables_commit() include/net/netfilter/nf_tables.h | 3 ++ net/netfilter/nf_tables_api.c | 56 ++++++++++++++++++++++--------- 2 files changed, 43 insertions(+), 16 deletions(-)