Message ID | 1473602728-33502-2-git-send-email-zlpnobody@163.com |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
Liping Zhang <zlpnobody@163.com> wrote: > From: Liping Zhang <liping.zhang@spreadtrum.com> > > Currently, the user can specify the queue numbers by _QUEUE_NUM and > _QUEUE_TOTAL attributes, this is enough in most situations. > > But acctually, it is not very flexible, for example: > tcp dport 80 mapped to queue0 > tcp dport 81 mapped to queue1 > tcp dport 82 mapped to queue2 > In order to do this thing, we must add 3 nft rules, and more > mapping means more rules ... > > So similer to nft_nat, take two registers to select the queue numbers, > then we can add one simple rule to mapping queues, maybe like this: > queue num tcp dport map { 80:0, 81:1, 82:2 ... } I like this. My first thought was that it would be better to just support one single sreg (the queue number) and eventually externalize the hashing/queue selection: queue num jhash ip saddr . ip daddr mod ... Problem is that with plain jhash we won't get a symmetric hash for origin and reply, so for this we would need a new expression/hash mode. We would also need another expression to allow distribution starting with a queue other than 0. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-09-12 5:12 GMT+08:00 Florian Westphal <fw@strlen.de>: > Liping Zhang <zlpnobody@163.com> wrote: >> So similer to nft_nat, take two registers to select the queue numbers, >> then we can add one simple rule to mapping queues, maybe like this: >> queue num tcp dport map { 80:0, 81:1, 82:2 ... } > > I like this. > > My first thought was that it would be better to just support one single > sreg (the queue number) and eventually externalize the hashing/queue > selection: > > queue num jhash ip saddr . ip daddr mod ... Sounds good. At first, my another intention is use _SREG_FROM and _SREG_TO to replace _QUEUE_NUM and _QUEUE_TOTAL, there's no restriction to use range 0-65535: [ immediate reg 1 0x00000000 ] [ immediate reg 2 0x0000ffff ] [ queue num 0 reg_from 1 reg_to 2 ] But I think your "queue num jhash ip saddr . ip daddr mod ..." is more flexible and there's no restriction to use range 0-65535 too. I agree with you, one sreg seems enough. I will send V2 later. > Problem is that with plain jhash we won't get a symmetric hash > for origin and reply, so for this we would need a new expression/hash > mode. > > We would also need another expression to allow distribution > starting with a queue other than 0. I think Laura is developing this option, see https://patchwork.ozlabs.org/patch/666334/. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 11, 2016 at 11:12:26PM +0200, Florian Westphal wrote: > Liping Zhang <zlpnobody@163.com> wrote: > > From: Liping Zhang <liping.zhang@spreadtrum.com> > > > > Currently, the user can specify the queue numbers by _QUEUE_NUM and > > _QUEUE_TOTAL attributes, this is enough in most situations. > > > > But acctually, it is not very flexible, for example: > > tcp dport 80 mapped to queue0 > > tcp dport 81 mapped to queue1 > > tcp dport 82 mapped to queue2 > > In order to do this thing, we must add 3 nft rules, and more > > mapping means more rules ... > > > > So similer to nft_nat, take two registers to select the queue numbers, > > then we can add one simple rule to mapping queues, maybe like this: > > queue num tcp dport map { 80:0, 81:1, 82:2 ... } > > I like this. > > My first thought was that it would be better to just support one single > sreg (the queue number) and eventually externalize the hashing/queue > selection: > > queue num jhash ip saddr . ip daddr mod ... > > Problem is that with plain jhash we won't get a symmetric hash > for origin and reply, so for this we would need a new expression/hash > mode. > > We would also need another expression to allow distribution > starting with a queue other than 0. For such feature, I've already sent a patch "netfilter: nft_hash: Add hash offset value" in order to set an initial value for the hash expression. I think it'll be available in the tree soon. > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 11, 2016 at 11:12:26PM +0200, Florian Westphal wrote: > Liping Zhang <zlpnobody@163.com> wrote: > > From: Liping Zhang <liping.zhang@spreadtrum.com> > > > > Currently, the user can specify the queue numbers by _QUEUE_NUM and > > _QUEUE_TOTAL attributes, this is enough in most situations. > > > > But acctually, it is not very flexible, for example: > > tcp dport 80 mapped to queue0 > > tcp dport 81 mapped to queue1 > > tcp dport 82 mapped to queue2 > > In order to do this thing, we must add 3 nft rules, and more > > mapping means more rules ... > > > > So similer to nft_nat, take two registers to select the queue numbers, > > then we can add one simple rule to mapping queues, maybe like this: > > queue num tcp dport map { 80:0, 81:1, 82:2 ... } > > I like this. > > My first thought was that it would be better to just support one single > sreg (the queue number) and eventually externalize the hashing/queue > selection: > > queue num jhash ip saddr . ip daddr mod ... > > Problem is that with plain jhash we won't get a symmetric hash > for origin and reply, so for this we would need a new expression/hash > mode. Are you think of xor hashing to provide the symmetry? Downside is that bad tuple selection may result in poor distribution, but this is something we can document. Still it would be possible to use jhash at the cost of having two rules. It would be great to get a new NFTA_HASH_MODE attribute and something like: enum nft_hash_types { NFT_HASH_JENKINS = 0, }; So we have room to make other hash algos fit into this expression. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Sun, Sep 11, 2016 at 11:12:26PM +0200, Florian Westphal wrote: > > My first thought was that it would be better to just support one single > > sreg (the queue number) and eventually externalize the hashing/queue > > selection: > > > > queue num jhash ip saddr . ip daddr mod ... > > > > Problem is that with plain jhash we won't get a symmetric hash > > for origin and reply, so for this we would need a new expression/hash > > mode. > > Are you think of xor hashing to provide the symmetry? Downside is that > bad tuple selection may result in poor distribution, but this is > something we can document. No, I was thinking of a new hash mode to do this, e.g. just do same what current nfqueue selection does: hash lower address first. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Laura Garcia <nevola@gmail.com> wrote: > On Sun, Sep 11, 2016 at 11:12:26PM +0200, Florian Westphal wrote: > > Liping Zhang <zlpnobody@163.com> wrote: > > > From: Liping Zhang <liping.zhang@spreadtrum.com> > > > > > > Currently, the user can specify the queue numbers by _QUEUE_NUM and > > > _QUEUE_TOTAL attributes, this is enough in most situations. > > > > > > But acctually, it is not very flexible, for example: > > > tcp dport 80 mapped to queue0 > > > tcp dport 81 mapped to queue1 > > > tcp dport 82 mapped to queue2 > > > In order to do this thing, we must add 3 nft rules, and more > > > mapping means more rules ... > > > > > > So similer to nft_nat, take two registers to select the queue numbers, > > > then we can add one simple rule to mapping queues, maybe like this: > > > queue num tcp dport map { 80:0, 81:1, 82:2 ... } > > > > I like this. > > > > My first thought was that it would be better to just support one single > > sreg (the queue number) and eventually externalize the hashing/queue > > selection: > > > > queue num jhash ip saddr . ip daddr mod ... > > > > Problem is that with plain jhash we won't get a symmetric hash > > for origin and reply, so for this we would need a new expression/hash > > mode. > > > > We would also need another expression to allow distribution > > starting with a queue other than 0. > > For such feature, I've already sent a patch "netfilter: nft_hash: Add > hash offset value" in order to set an initial value for the hash > expression. I think it'll be available in the tree soon. Yes, but that means one would always need to use hash. Maybe thats indeed sufficient since it already provides a modulo functionality. I was thinking nfqueue num meta cpu ... nfqueue num meta hash ... [ i.e. use skb->hash, not supported right now ] nfqueue num meta mark nfqueue num ct mark etc. so we would have no control over start qnum (something like 'nfqueue num (meta cpu + 10000)' ... Admittingly, I do not see a big drawback if we'd always have to feed it to jhash first. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 12, 2016 at 02:22:57PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Sun, Sep 11, 2016 at 11:12:26PM +0200, Florian Westphal wrote: > > > My first thought was that it would be better to just support one single > > > sreg (the queue number) and eventually externalize the hashing/queue > > > selection: > > > > > > queue num jhash ip saddr . ip daddr mod ... > > > > > > Problem is that with plain jhash we won't get a symmetric hash > > > for origin and reply, so for this we would need a new expression/hash > > > mode. > > > > Are you think of xor hashing to provide the symmetry? Downside is that > > bad tuple selection may result in poor distribution, but this is > > something we can document. > > No, I was thinking of a new hash mode to do this, e.g. just do same > what current nfqueue selection does: hash lower address first. Currently we have one single register pointing to the entire tuple concatenation that we hash, we would need to support multiple registers as input, check that they are consecutive. Then, the logic to compare the data. And a way to express this in syntax. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 12, 2016 at 02:28:49PM +0200, Florian Westphal wrote: > Laura Garcia <nevola@gmail.com> wrote: > > On Sun, Sep 11, 2016 at 11:12:26PM +0200, Florian Westphal wrote: > > > Liping Zhang <zlpnobody@163.com> wrote: > > > > From: Liping Zhang <liping.zhang@spreadtrum.com> > > > > > > > > Currently, the user can specify the queue numbers by _QUEUE_NUM and > > > > _QUEUE_TOTAL attributes, this is enough in most situations. > > > > > > > > But acctually, it is not very flexible, for example: > > > > tcp dport 80 mapped to queue0 > > > > tcp dport 81 mapped to queue1 > > > > tcp dport 82 mapped to queue2 > > > > In order to do this thing, we must add 3 nft rules, and more > > > > mapping means more rules ... > > > > > > > > So similer to nft_nat, take two registers to select the queue numbers, > > > > then we can add one simple rule to mapping queues, maybe like this: > > > > queue num tcp dport map { 80:0, 81:1, 82:2 ... } > > > > > > I like this. > > > > > > My first thought was that it would be better to just support one single > > > sreg (the queue number) and eventually externalize the hashing/queue > > > selection: > > > > > > queue num jhash ip saddr . ip daddr mod ... > > > > > > Problem is that with plain jhash we won't get a symmetric hash > > > for origin and reply, so for this we would need a new expression/hash > > > mode. > > > > > > We would also need another expression to allow distribution > > > starting with a queue other than 0. > > > > For such feature, I've already sent a patch "netfilter: nft_hash: Add > > hash offset value" in order to set an initial value for the hash > > expression. I think it'll be available in the tree soon. > > Yes, but that means one would always need to use hash. > > Maybe thats indeed sufficient since it already provides a modulo > functionality. > > I was thinking > > nfqueue num meta cpu ... > nfqueue num meta hash ... [ i.e. use skb->hash, not supported right now ] > nfqueue num meta mark > nfqueue num ct mark > > etc. so we would have no control over start qnum (something like > 'nfqueue num (meta cpu + 10000)' ... > > Admittingly, I do not see a big drawback if we'd always have to > feed it to jhash first. I think it would be easy to add this offset support to meta and ct, so we adding this value to whatever we fetch. I think a generic arithmetic expression that allows us to add two values from two registers is too much. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Mon, Sep 12, 2016 at 02:22:57PM +0200, Florian Westphal wrote: > > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > On Sun, Sep 11, 2016 at 11:12:26PM +0200, Florian Westphal wrote: > > > > My first thought was that it would be better to just support one single > > > > sreg (the queue number) and eventually externalize the hashing/queue > > > > selection: > > > > > > > > queue num jhash ip saddr . ip daddr mod ... > > > > > > > > Problem is that with plain jhash we won't get a symmetric hash > > > > for origin and reply, so for this we would need a new expression/hash > > > > mode. > > > > > > Are you think of xor hashing to provide the symmetry? Downside is that > > > bad tuple selection may result in poor distribution, but this is > > > something we can document. > > > > No, I was thinking of a new hash mode to do this, e.g. just do same > > what current nfqueue selection does: hash lower address first. > > Currently we have one single register pointing to the entire tuple > concatenation that we hash, we would need to support multiple > registers as input, check that they are consecutive. Then, the logic > to compare the data. And a way to express this in syntax. Ugh. Ok, lets ignore this for now. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Liping, A bit more comments on top of Florian's suggestion to use one single _SREG. On Sun, Sep 11, 2016 at 10:05:28PM +0800, Liping Zhang wrote: > diff --git a/net/netfilter/nft_queue.c b/net/netfilter/nft_queue.c > index d16d599..6557118 100644 > --- a/net/netfilter/nft_queue.c > +++ b/net/netfilter/nft_queue.c > @@ -22,9 +22,11 @@ > static u32 jhash_initval __read_mostly; > > struct nft_queue { > - u16 queuenum; > - u16 queues_total; > - u16 flags; > + enum nft_registers sreg_from:8; > + enum nft_registers sreg_to:8; > + u16 queuenum; > + u16 queues_total; > + u16 flags; > }; > > static void nft_queue_eval(const struct nft_expr *expr, > @@ -32,17 +34,30 @@ static void nft_queue_eval(const struct nft_expr *expr, > const struct nft_pktinfo *pkt) > { > struct nft_queue *priv = nft_expr_priv(expr); > - u32 queue = priv->queuenum; > + u32 queue, queues_total, queue_end; > u32 ret; > > - if (priv->queues_total > 1) { > + if (priv->sreg_from) { > + queue = (u16)regs->data[priv->sreg_from]; > + queue_end = (u16)regs->data[priv->sreg_to]; > + > + if (queue_end > queue) > + queues_total = queue_end - queue + 1; > + else > + queues_total = 1; > + } else { > + queue = priv->queuenum; > + queues_total = priv->queues_total; > + } I suggest you use .select_ops to use a specific _eval function if _SREG is set. This would disentangle this code a bit. > + > + if (queues_total > 1) { > if (priv->flags & NFT_QUEUE_FLAG_CPU_FANOUT) { > int cpu = smp_processor_id(); > > - queue = priv->queuenum + cpu % priv->queues_total; > + queue += cpu % queues_total; > } else { > queue = nfqueue_hash(pkt->skb, queue, > - priv->queues_total, pkt->pf, > + queues_total, pkt->pf, > jhash_initval); > } > } > @@ -58,6 +73,8 @@ static const struct nla_policy nft_queue_policy[NFTA_QUEUE_MAX + 1] = { > [NFTA_QUEUE_NUM] = { .type = NLA_U16 }, > [NFTA_QUEUE_TOTAL] = { .type = NLA_U16 }, > [NFTA_QUEUE_FLAGS] = { .type = NLA_U16 }, > + [NFTA_QUEUE_SREG_FROM] = { .type = NLA_U32 }, > + [NFTA_QUEUE_SREG_TO] = { .type = NLA_U32 }, > }; > > static int nft_queue_init(const struct nft_ctx *ctx, > @@ -66,30 +83,58 @@ static int nft_queue_init(const struct nft_ctx *ctx, > { > struct nft_queue *priv = nft_expr_priv(expr); > u32 maxid; > + int err; > > - if (tb[NFTA_QUEUE_NUM] == NULL) > + if (!tb[NFTA_QUEUE_NUM] && !tb[NFTA_QUEUE_SREG_FROM]) > return -EINVAL; > > init_hashrandom(&jhash_initval); > - priv->queuenum = ntohs(nla_get_be16(tb[NFTA_QUEUE_NUM])); > > - if (tb[NFTA_QUEUE_TOTAL] != NULL) > - priv->queues_total = ntohs(nla_get_be16(tb[NFTA_QUEUE_TOTAL])); > - else > - priv->queues_total = 1; > + /* nftables has no idea whether the kernel supports _SREG_FROM or not, > + * so for compatibility reason, it may specify the _SREG_FROM and > + * _QUEUE_NUM attributes at the same time. We prefer to use _SREG_FROM, > + * it is more flexible and has less restriction, for example, queue > + * range 0-65535 is ok when use _SREG_FROM and _SREG_TO. > + */ No need for this large comment, look. From userspace, we have to modify the nft parser to take an expression as 'num'. Then, from the netlink_linearize path, we can check the expression type: * If it's a value expression, we can simply use NFTA_QUEUE_NUM. * If it's a range expression, we have to set up NFTA_QUEUE_NUM and NFTA_QUEUE_TOTAL. * If it's a mapping, then we only use _SREG. Old kernels will bail out if the mapping is used, as this is not supported, which is what we expect. This should be fine by now, until I finish the patchset to provide a VM description that userspace can use to generate bytecode depending on the features available in the kernel. > + if (tb[NFTA_QUEUE_SREG_FROM]) { > + priv->sreg_from = nft_parse_register(tb[NFTA_QUEUE_SREG_FROM]); > + err = nft_validate_register_load(priv->sreg_from, sizeof(u16)); > + if (err < 0) > + return err; > + > + if (tb[NFTA_QUEUE_SREG_TO]) { > + priv->sreg_to = > + nft_parse_register(tb[NFTA_QUEUE_SREG_TO]); > + err = nft_validate_register_load(priv->sreg_to, > + sizeof(u16)); > + if (err < 0) > + return err; > + } else { > + priv->sreg_to = priv->sreg_from; > + } > + } else if (tb[NFTA_QUEUE_NUM]) { > + priv->queuenum = ntohs(nla_get_be16(tb[NFTA_QUEUE_NUM])); > > - if (priv->queues_total == 0) > - return -EINVAL; > + if (tb[NFTA_QUEUE_TOTAL]) > + priv->queues_total = > + ntohs(nla_get_be16(tb[NFTA_QUEUE_TOTAL])); > + else > + priv->queues_total = 1; > > - maxid = priv->queues_total - 1 + priv->queuenum; > - if (maxid > U16_MAX) > - return -ERANGE; > + if (priv->queues_total == 0) > + return -EINVAL; > + > + maxid = priv->queues_total - 1 + priv->queuenum; > + if (maxid > U16_MAX) > + return -ERANGE; > + } > > if (tb[NFTA_QUEUE_FLAGS] != NULL) { > priv->flags = ntohs(nla_get_be16(tb[NFTA_QUEUE_FLAGS])); > if (priv->flags & ~NFT_QUEUE_FLAG_MASK) > return -EINVAL; > } > + > return 0; > } > > @@ -97,9 +142,21 @@ static int nft_queue_dump(struct sk_buff *skb, const struct nft_expr *expr) > { > const struct nft_queue *priv = nft_expr_priv(expr); > > - if (nla_put_be16(skb, NFTA_QUEUE_NUM, htons(priv->queuenum)) || > - nla_put_be16(skb, NFTA_QUEUE_TOTAL, htons(priv->queues_total)) || > - nla_put_be16(skb, NFTA_QUEUE_FLAGS, htons(priv->flags))) > + if (priv->sreg_from) { > + if (nft_dump_register(skb, NFTA_QUEUE_SREG_FROM, > + priv->sreg_from)) > + goto nla_put_failure; > + if (nft_dump_register(skb, NFTA_QUEUE_SREG_TO, > + priv->sreg_to)) > + goto nla_put_failure; > + } else { > + if (nla_put_be16(skb, NFTA_QUEUE_NUM, htons(priv->queuenum)) || > + nla_put_be16(skb, NFTA_QUEUE_TOTAL, > + htons(priv->queues_total))) > + goto nla_put_failure; > + } Looking at this, the code will look better if we should use .select_ops indeed. Thanks for resolving existing nft_queue limitations! -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-09-13 17:19 GMT+08:00 Pablo Neira Ayuso <pablo@netfilter.org>: > Hi Liping, > > A bit more comments on top of Florian's suggestion to use one single > _SREG. > > On Sun, Sep 11, 2016 at 10:05:28PM +0800, Liping Zhang wrote: >> diff --git a/net/netfilter/nft_queue.c b/net/netfilter/nft_queue.c >> static void nft_queue_eval(const struct nft_expr *expr, >> @@ -32,17 +34,30 @@ static void nft_queue_eval(const struct nft_expr *expr, >> const struct nft_pktinfo *pkt) >> { >> struct nft_queue *priv = nft_expr_priv(expr); >> - u32 queue = priv->queuenum; >> + u32 queue, queues_total, queue_end; >> u32 ret; >> >> - if (priv->queues_total > 1) { >> + if (priv->sreg_from) { >> + queue = (u16)regs->data[priv->sreg_from]; >> + queue_end = (u16)regs->data[priv->sreg_to]; >> + >> + if (queue_end > queue) >> + queues_total = queue_end - queue + 1; >> + else >> + queues_total = 1; >> + } else { >> + queue = priv->queuenum; >> + queues_total = priv->queues_total; >> + } > > I suggest you use .select_ops to use a specific _eval function if > _SREG is set. This would disentangle this code a bit. > >> >> + /* nftables has no idea whether the kernel supports _SREG_FROM or not, >> + * so for compatibility reason, it may specify the _SREG_FROM and >> + * _QUEUE_NUM attributes at the same time. We prefer to use _SREG_FROM, >> + * it is more flexible and has less restriction, for example, queue >> + * range 0-65535 is ok when use _SREG_FROM and _SREG_TO. >> + */ > > No need for this large comment, look. > > From userspace, we have to modify the nft parser to take an expression > as 'num'. Then, from the netlink_linearize path, we can check the > expression type: > > * If it's a value expression, we can simply use NFTA_QUEUE_NUM. > * If it's a range expression, we have to set up NFTA_QUEUE_NUM and > NFTA_QUEUE_TOTAL. > * If it's a mapping, then we only use _SREG. > > Old kernels will bail out if the mapping is used, as this is not > supported, which is what we expect. This should be fine by now, until > I finish the patchset to provide a VM description that userspace can > use to generate bytecode depending on the features available in the > kernel. > >> >> @@ -97,9 +142,21 @@ static int nft_queue_dump(struct sk_buff *skb, const struct nft_expr *expr) >> { >> const struct nft_queue *priv = nft_expr_priv(expr); >> >> - if (nla_put_be16(skb, NFTA_QUEUE_NUM, htons(priv->queuenum)) || >> - nla_put_be16(skb, NFTA_QUEUE_TOTAL, htons(priv->queues_total)) || >> - nla_put_be16(skb, NFTA_QUEUE_FLAGS, htons(priv->flags))) >> + if (priv->sreg_from) { >> + if (nft_dump_register(skb, NFTA_QUEUE_SREG_FROM, >> + priv->sreg_from)) >> + goto nla_put_failure; >> + if (nft_dump_register(skb, NFTA_QUEUE_SREG_TO, >> + priv->sreg_to)) >> + goto nla_put_failure; >> + } else { >> + if (nla_put_be16(skb, NFTA_QUEUE_NUM, htons(priv->queuenum)) || >> + nla_put_be16(skb, NFTA_QUEUE_TOTAL, >> + htons(priv->queues_total))) >> + goto nla_put_failure; >> + } > > Looking at this, the code will look better if we should use > .select_ops indeed. > > Thanks for resolving existing nft_queue limitations! I will re-spin my patch based on your and Florian's suggestions. Thanks Pablo. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 24161e2..5b4e373 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -886,12 +886,16 @@ enum nft_log_attributes { * @NFTA_QUEUE_NUM: netlink queue to send messages to (NLA_U16) * @NFTA_QUEUE_TOTAL: number of queues to load balance packets on (NLA_U16) * @NFTA_QUEUE_FLAGS: various flags (NLA_U16) + * @NFTA_QUEUE_SREG_FROM: source register of queue number start (NLA_U32: nft_registers) + * @NFTA_QUEUE_SREG_TO: source register of queue number end (NLA_U32: nft_registers) */ enum nft_queue_attributes { NFTA_QUEUE_UNSPEC, NFTA_QUEUE_NUM, NFTA_QUEUE_TOTAL, NFTA_QUEUE_FLAGS, + NFTA_QUEUE_SREG_FROM, + NFTA_QUEUE_SREG_TO, __NFTA_QUEUE_MAX }; #define NFTA_QUEUE_MAX (__NFTA_QUEUE_MAX - 1) diff --git a/net/netfilter/nft_queue.c b/net/netfilter/nft_queue.c index d16d599..6557118 100644 --- a/net/netfilter/nft_queue.c +++ b/net/netfilter/nft_queue.c @@ -22,9 +22,11 @@ static u32 jhash_initval __read_mostly; struct nft_queue { - u16 queuenum; - u16 queues_total; - u16 flags; + enum nft_registers sreg_from:8; + enum nft_registers sreg_to:8; + u16 queuenum; + u16 queues_total; + u16 flags; }; static void nft_queue_eval(const struct nft_expr *expr, @@ -32,17 +34,30 @@ static void nft_queue_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { struct nft_queue *priv = nft_expr_priv(expr); - u32 queue = priv->queuenum; + u32 queue, queues_total, queue_end; u32 ret; - if (priv->queues_total > 1) { + if (priv->sreg_from) { + queue = (u16)regs->data[priv->sreg_from]; + queue_end = (u16)regs->data[priv->sreg_to]; + + if (queue_end > queue) + queues_total = queue_end - queue + 1; + else + queues_total = 1; + } else { + queue = priv->queuenum; + queues_total = priv->queues_total; + } + + if (queues_total > 1) { if (priv->flags & NFT_QUEUE_FLAG_CPU_FANOUT) { int cpu = smp_processor_id(); - queue = priv->queuenum + cpu % priv->queues_total; + queue += cpu % queues_total; } else { queue = nfqueue_hash(pkt->skb, queue, - priv->queues_total, pkt->pf, + queues_total, pkt->pf, jhash_initval); } } @@ -58,6 +73,8 @@ static const struct nla_policy nft_queue_policy[NFTA_QUEUE_MAX + 1] = { [NFTA_QUEUE_NUM] = { .type = NLA_U16 }, [NFTA_QUEUE_TOTAL] = { .type = NLA_U16 }, [NFTA_QUEUE_FLAGS] = { .type = NLA_U16 }, + [NFTA_QUEUE_SREG_FROM] = { .type = NLA_U32 }, + [NFTA_QUEUE_SREG_TO] = { .type = NLA_U32 }, }; static int nft_queue_init(const struct nft_ctx *ctx, @@ -66,30 +83,58 @@ static int nft_queue_init(const struct nft_ctx *ctx, { struct nft_queue *priv = nft_expr_priv(expr); u32 maxid; + int err; - if (tb[NFTA_QUEUE_NUM] == NULL) + if (!tb[NFTA_QUEUE_NUM] && !tb[NFTA_QUEUE_SREG_FROM]) return -EINVAL; init_hashrandom(&jhash_initval); - priv->queuenum = ntohs(nla_get_be16(tb[NFTA_QUEUE_NUM])); - if (tb[NFTA_QUEUE_TOTAL] != NULL) - priv->queues_total = ntohs(nla_get_be16(tb[NFTA_QUEUE_TOTAL])); - else - priv->queues_total = 1; + /* nftables has no idea whether the kernel supports _SREG_FROM or not, + * so for compatibility reason, it may specify the _SREG_FROM and + * _QUEUE_NUM attributes at the same time. We prefer to use _SREG_FROM, + * it is more flexible and has less restriction, for example, queue + * range 0-65535 is ok when use _SREG_FROM and _SREG_TO. + */ + if (tb[NFTA_QUEUE_SREG_FROM]) { + priv->sreg_from = nft_parse_register(tb[NFTA_QUEUE_SREG_FROM]); + err = nft_validate_register_load(priv->sreg_from, sizeof(u16)); + if (err < 0) + return err; + + if (tb[NFTA_QUEUE_SREG_TO]) { + priv->sreg_to = + nft_parse_register(tb[NFTA_QUEUE_SREG_TO]); + err = nft_validate_register_load(priv->sreg_to, + sizeof(u16)); + if (err < 0) + return err; + } else { + priv->sreg_to = priv->sreg_from; + } + } else if (tb[NFTA_QUEUE_NUM]) { + priv->queuenum = ntohs(nla_get_be16(tb[NFTA_QUEUE_NUM])); - if (priv->queues_total == 0) - return -EINVAL; + if (tb[NFTA_QUEUE_TOTAL]) + priv->queues_total = + ntohs(nla_get_be16(tb[NFTA_QUEUE_TOTAL])); + else + priv->queues_total = 1; - maxid = priv->queues_total - 1 + priv->queuenum; - if (maxid > U16_MAX) - return -ERANGE; + if (priv->queues_total == 0) + return -EINVAL; + + maxid = priv->queues_total - 1 + priv->queuenum; + if (maxid > U16_MAX) + return -ERANGE; + } if (tb[NFTA_QUEUE_FLAGS] != NULL) { priv->flags = ntohs(nla_get_be16(tb[NFTA_QUEUE_FLAGS])); if (priv->flags & ~NFT_QUEUE_FLAG_MASK) return -EINVAL; } + return 0; } @@ -97,9 +142,21 @@ static int nft_queue_dump(struct sk_buff *skb, const struct nft_expr *expr) { const struct nft_queue *priv = nft_expr_priv(expr); - if (nla_put_be16(skb, NFTA_QUEUE_NUM, htons(priv->queuenum)) || - nla_put_be16(skb, NFTA_QUEUE_TOTAL, htons(priv->queues_total)) || - nla_put_be16(skb, NFTA_QUEUE_FLAGS, htons(priv->flags))) + if (priv->sreg_from) { + if (nft_dump_register(skb, NFTA_QUEUE_SREG_FROM, + priv->sreg_from)) + goto nla_put_failure; + if (nft_dump_register(skb, NFTA_QUEUE_SREG_TO, + priv->sreg_to)) + goto nla_put_failure; + } else { + if (nla_put_be16(skb, NFTA_QUEUE_NUM, htons(priv->queuenum)) || + nla_put_be16(skb, NFTA_QUEUE_TOTAL, + htons(priv->queues_total))) + goto nla_put_failure; + } + + if (nla_put_be16(skb, NFTA_QUEUE_FLAGS, htons(priv->flags))) goto nla_put_failure; return 0;