Message ID | 8b1e8907e8dd9a51f2e40b39cd5f5c2b0eae94fd.1436574843.git.daniel@iogearbox.net |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
On Sat, Jul 11, 2015 at 03:14:07AM +0200, Daniel Borkmann wrote: > This work adds the possibility of deriving the zone id from the skb->mark > field in a scalable manner. This allows for having only a single template > serving 100s .. 1000s of different zones, for example, instead of needing > to have one match for each zone as an extra CT jump target. Note that we'd > need to have this information attached to the template as at the time when > we're trying to lookup a possible ct object, we already need to know zone > information for a possible match when going into __nf_conntrack_find_get(). > This work provides a minimal implementation for a possible mapping. I think connmark is a better place for this feature, given that the zone is a ct extension. Moreover, I guess it will not take long until someone sends us a patch to perform some bitwise operation to only fetch some of the skb->mark bits into the zone. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/15/2015 07:50 PM, Pablo Neira Ayuso wrote: > On Sat, Jul 11, 2015 at 03:14:07AM +0200, Daniel Borkmann wrote: >> This work adds the possibility of deriving the zone id from the skb->mark >> field in a scalable manner. This allows for having only a single template >> serving 100s .. 1000s of different zones, for example, instead of needing >> to have one match for each zone as an extra CT jump target. Note that we'd >> need to have this information attached to the template as at the time when >> we're trying to lookup a possible ct object, we already need to know zone >> information for a possible match when going into __nf_conntrack_find_get(). >> This work provides a minimal implementation for a possible mapping. > > I think connmark is a better place for this feature, given that the > zone is a ct extension. Moreover, I guess it will not take long until > someone sends us a patch to perform some bitwise operation to only > fetch some of the skb->mark bits into the zone. Hm, we do need the zoning information *before* we do the actual lookup for a ct object (non-template I mean), otherwise we don't know in which zone to find it. When I looked into this, the connmark target is applied afterwards on the actual ct object. So you mean to add this to the raw table, so that someone could for each skb assign ct->mark := skb->mark on the template and then have zone := ct->mark, so we can use it for looking up? I would also need more than a single template for that, right, as otherwise if I'd have arbitrary ct->mark := skb->mark assignments in parallel, then we'd race. The current seems rather simple. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/15/2015 10:04 PM, Daniel Borkmann wrote: > On 07/15/2015 07:50 PM, Pablo Neira Ayuso wrote: >> On Sat, Jul 11, 2015 at 03:14:07AM +0200, Daniel Borkmann wrote: >>> This work adds the possibility of deriving the zone id from the skb->mark >>> field in a scalable manner. This allows for having only a single template >>> serving 100s .. 1000s of different zones, for example, instead of needing >>> to have one match for each zone as an extra CT jump target. Note that we'd >>> need to have this information attached to the template as at the time when >>> we're trying to lookup a possible ct object, we already need to know zone >>> information for a possible match when going into __nf_conntrack_find_get(). >>> This work provides a minimal implementation for a possible mapping. >> >> I think connmark is a better place for this feature, given that the >> zone is a ct extension. Moreover, I guess it will not take long until >> someone sends us a patch to perform some bitwise operation to only >> fetch some of the skb->mark bits into the zone. > > Hm, we do need the zoning information *before* we do the actual lookup for > a ct object (non-template I mean), otherwise we don't know in which zone to > find it. When I looked into this, the connmark target is applied afterwards > on the actual ct object. > > So you mean to add this to the raw table, so that someone could for each skb > assign ct->mark := skb->mark on the template and then have zone := ct->mark, > so we can use it for looking up? I would also need more than a single template > for that, right, as otherwise if I'd have arbitrary ct->mark := skb->mark > assignments in parallel, then we'd race. The current seems rather simple. Sorry to bug you again on this patchset. I've been thinking a bit more about your suggestion, my previous thoughts/reply on this, and how to move forward. Lets presume for a moment, we would indeed place this into the connmark target. That would mean, we want to store the mark from the skb first into ct->mark and then into the zone of the ct. We would need to do this on the template itself, so that when the packet comes in, we know in which zone we're going to do the actual lookup of a ct entry. When using a single, global template, this seems racy, you would at least need to make sure that during the lookup it's not being overwritten by other incoming skbs. If you say, the ct->mark assignment should be skipped and the skb->mark should directly be assigned to the zone, then why putting this into connmark as it has nothing to do with the connection mark anymore (it would also avoid further unnecessary dependency of NF_CONNTRACK_ZONES on NF_CONNTRACK_MARK and NETFILTER_XT_CONNMARK). But also here, we'd have the same race issues as mentioned with ct->mark when overwriting zone ids. The only thing I could imagine would be for the template to hold a possible mask in the ct->mark itself, which could be user configured, and then does in pseudocode, tmpl_zone->id := skb->mark & tmpl->mark. If someone would really have such a use case (?), fair enough, this could be followed-up. The current approach implemented here that I found so far most appealing and having the least complexity, was to just have a /single/ template and to overwrite the zone->id with skb->mark on the ptr we have sitting on the stack. It avoids all the issues mentioned. But perhaps you mean something entirely different and I just seem to misinterpret your answer, hmm. Thanks again, Pablo. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 20, 2015 at 06:18:55PM +0200, Daniel Borkmann wrote: [...] > The current approach implemented here that I found so far most appealing > and having the least complexity, was to just have a /single/ template and to > overwrite the zone->id with skb->mark on the ptr we have sitting on the stack. > It avoids all the issues mentioned. But perhaps you mean something entirely > different and I just seem to misinterpret your answer, hmm. You mean something that from command line would look like: iptables -A PREROUTING -t raw -j CT --zone mark So we set the zone ID in the CT target based on the existing mark, right? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/20/2015 07:03 PM, Pablo Neira Ayuso wrote: > On Mon, Jul 20, 2015 at 06:18:55PM +0200, Daniel Borkmann wrote: > [...] >> The current approach implemented here that I found so far most appealing >> and having the least complexity, was to just have a /single/ template and to >> overwrite the zone->id with skb->mark on the ptr we have sitting on the stack. >> It avoids all the issues mentioned. But perhaps you mean something entirely >> different and I just seem to misinterpret your answer, hmm. > > You mean something that from command line would look like: > > iptables -A PREROUTING -t raw -j CT --zone mark > > So we set the zone ID in the CT target based on the existing mark, > right? Not in the target callback, in the example script and patches I've provided, I'm indeed configuring ... iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL ... which in nf_ct_zone_tmpl() call-sites will return the skb->mark mapped zone ID, that is then used f.e. directly for the lookup in the hash table resp. following ct entry allocation in case a lookup didn't return a ct entry. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 20, 2015 at 07:27:17PM +0200, Daniel Borkmann wrote: > On 07/20/2015 07:03 PM, Pablo Neira Ayuso wrote: > >On Mon, Jul 20, 2015 at 06:18:55PM +0200, Daniel Borkmann wrote: > >[...] > >>The current approach implemented here that I found so far most appealing > >>and having the least complexity, was to just have a /single/ template and to > >>overwrite the zone->id with skb->mark on the ptr we have sitting on the stack. > >>It avoids all the issues mentioned. But perhaps you mean something entirely > >>different and I just seem to misinterpret your answer, hmm. > > > >You mean something that from command line would look like: > > > > iptables -A PREROUTING -t raw -j CT --zone mark > > > >So we set the zone ID in the CT target based on the existing mark, > >right? > > Not in the target callback, in the example script and patches I've provided, > I'm indeed configuring ... > > iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL > > ... which in nf_ct_zone_tmpl() call-sites will return the skb->mark mapped > zone ID, that is then used f.e. directly for the lookup in the hash table > resp. following ct entry allocation in case a lookup didn't return a ct entry. I see, thanks for explaining. I would like to avoid the use of the ct->status bit to set this. Can you see a clean way to store this bit in the zone extension instead? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/20/2015 08:24 PM, Pablo Neira Ayuso wrote: ... > I see, thanks for explaining. > > I would like to avoid the use of the ct->status bit to set this. Can > you see a clean way to store this bit in the zone extension instead? Okay, understood, i.e. since it's unfortunately exported through UAPI and there's limited space. I'm thinking of renaming the u16 for the direction in the zones structure into 'flags' and just add an indicator there [as we still have unused bits there] ... would that seem better? Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 20, 2015 at 10:05:16PM +0200, Daniel Borkmann wrote: > On 07/20/2015 08:24 PM, Pablo Neira Ayuso wrote: > ... > >I see, thanks for explaining. > > > >I would like to avoid the use of the ct->status bit to set this. Can > >you see a clean way to store this bit in the zone extension instead? > > Okay, understood, i.e. since it's unfortunately exported through UAPI > and there's limited space. I'm thinking of renaming the u16 for the > direction in the zones structure into 'flags' and just add an indicator > there [as we still have unused bits there] ... would that seem better? Grab u8 for flags. u8 to store directions should be sufficient I'd suggest. BTW, did you consider replacing NF_CT_DEFAULT_ZONE by a global object? It looks like a natural way in the patch that replaces the u16 by struct nf_conntrack_zone. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/21/2015 09:37 AM, Pablo Neira Ayuso wrote: > On Mon, Jul 20, 2015 at 10:05:16PM +0200, Daniel Borkmann wrote: >> On 07/20/2015 08:24 PM, Pablo Neira Ayuso wrote: >> ... >>> I see, thanks for explaining. >>> >>> I would like to avoid the use of the ct->status bit to set this. Can >>> you see a clean way to store this bit in the zone extension instead? >> >> Okay, understood, i.e. since it's unfortunately exported through UAPI >> and there's limited space. I'm thinking of renaming the u16 for the >> direction in the zones structure into 'flags' and just add an indicator >> there [as we still have unused bits there] ... would that seem better? > > Grab u8 for flags. u8 to store directions should be sufficient I'd > suggest. That's fine as well, will do. > BTW, did you consider replacing NF_CT_DEFAULT_ZONE by a global object? > It looks like a natural way in the patch that replaces the u16 by > struct nf_conntrack_zone. We still need the NF_CT_DEFAULT_ZONE itself, the ID I mean, in a couple of places, but I'll look into having a global default struct and replace it in these places that don't have zone support. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/netfilter/nf_conntrack_zones.h b/include/net/netfilter/nf_conntrack_zones.h index 9e1351b..52242fb 100644 --- a/include/net/netfilter/nf_conntrack_zones.h +++ b/include/net/netfilter/nf_conntrack_zones.h @@ -48,9 +48,19 @@ nf_ct_zone(const struct nf_conn *ct, struct nf_conntrack_zone *ptr) } static inline struct nf_conntrack_zone * -nf_ct_zone_tmpl(const struct nf_conn *tmpl, struct nf_conntrack_zone *ptr) +nf_ct_zone_tmpl(const struct nf_conn *tmpl, const struct sk_buff *skb, + struct nf_conntrack_zone *ptr) { - return tmpl ? nf_ct_zone(tmpl, ptr) : nf_ct_zone_dflt(ptr); + struct nf_conntrack_zone *zone; + + if (!tmpl) + return nf_ct_zone_dflt(ptr); + + zone = nf_ct_zone(tmpl, ptr); + if (tmpl->status & IPS_TEMPLATE_MARK) + zone->id = skb->mark; + + return zone; } static inline bool nf_ct_zone_matches_dir(const struct nf_conntrack_zone *zone, diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h index 319f471..918c6bd 100644 --- a/include/uapi/linux/netfilter/nf_conntrack_common.h +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h @@ -91,6 +91,10 @@ enum ip_conntrack_status { /* Conntrack got a helper explicitly attached via CT target. */ IPS_HELPER_BIT = 13, IPS_HELPER = (1 << IPS_HELPER_BIT), + + /* Template is marked dynamically. */ + IPS_TEMPLATE_MARK_BIT = 14, + IPS_TEMPLATE_MARK = (1 << IPS_TEMPLATE_MARK_BIT), }; /* Connection tracking event types */ diff --git a/include/uapi/linux/netfilter/xt_CT.h b/include/uapi/linux/netfilter/xt_CT.h index 452005f..9e52041 100644 --- a/include/uapi/linux/netfilter/xt_CT.h +++ b/include/uapi/linux/netfilter/xt_CT.h @@ -8,9 +8,11 @@ enum { XT_CT_NOTRACK_ALIAS = 1 << 1, XT_CT_ZONE_DIR_ORIG = 1 << 2, XT_CT_ZONE_DIR_REPL = 1 << 3, + XT_CT_ZONE_MARK = 1 << 4, XT_CT_MASK = XT_CT_NOTRACK | XT_CT_NOTRACK_ALIAS | - XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL, + XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL | + XT_CT_ZONE_MARK, }; struct xt_ct_target_info { diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c index 75f7860..0ddd915 100644 --- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c +++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c @@ -137,7 +137,7 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, struct nf_conntrack_zone *zone, __zone; NF_CT_ASSERT(skb->nfct == NULL); - zone = nf_ct_zone_tmpl(tmpl, &__zone); + zone = nf_ct_zone_tmpl(tmpl, skb, &__zone); /* Are they talking about one of our connections? */ if (!nf_ct_get_tuplepr(skb, diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c index d5ad71f..1c9a4c7 100644 --- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c +++ b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c @@ -177,7 +177,8 @@ icmpv6_error_message(struct net *net, struct nf_conn *tmpl, *ctinfo = IP_CT_RELATED; - h = nf_conntrack_find_get(net, nf_ct_zone_tmpl(tmpl, &zone), &intuple); + h = nf_conntrack_find_get(net, nf_ct_zone_tmpl(tmpl, skb, &zone), + &intuple); if (!h) { pr_debug("icmpv6_error: no match\n"); return -NF_ACCEPT; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index cf7c15a..7d89529 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -919,7 +919,7 @@ init_conntrack(struct net *net, struct nf_conn *tmpl, return NULL; } - zone = nf_ct_zone_tmpl(tmpl, &__zone); + zone = nf_ct_zone_tmpl(tmpl, skb, &__zone); ct = __nf_conntrack_alloc(net, zone, tuple, &repl_tuple, GFP_ATOMIC, hash); if (IS_ERR(ct)) @@ -1028,7 +1028,7 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl, } /* look for tuple match */ - zone = nf_ct_zone_tmpl(tmpl, &__zone); + zone = nf_ct_zone_tmpl(tmpl, skb, &__zone); hash = hash_conntrack_raw(&tuple); h = __nf_conntrack_find_get(net, zone, &tuple, hash); if (!h) { diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c index 8646075..f92cbe9 100644 --- a/net/netfilter/xt_CT.c +++ b/net/netfilter/xt_CT.c @@ -20,6 +20,8 @@ #include <net/netfilter/nf_conntrack_timeout.h> #include <net/netfilter/nf_conntrack_zones.h> +#define XT_CT_ZONE_FLAGS (XT_CT_ZONE_DIR_ORIG | XT_CT_ZONE_DIR_REPL | XT_CT_ZONE_MARK) + static inline int xt_ct_target(struct sk_buff *skb, struct nf_conn *ct) { /* Previously seen (loopback)? Ignore. */ @@ -207,8 +209,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par, } #ifndef CONFIG_NF_CONNTRACK_ZONES - if (info->zone || info->flags & (XT_CT_ZONE_DIR_ORIG | - XT_CT_ZONE_DIR_REPL)) + if (info->zone || info->flags & XT_CT_ZONE_FLAGS) goto err1; #endif @@ -225,6 +226,9 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par, if (IS_ERR(ct)) goto err2; + if (info->flags & XT_CT_ZONE_MARK) + ct->status |= IPS_TEMPLATE_MARK; + ret = 0; if ((info->ct_events || info->exp_events) && !nf_ct_ecache_ext_add(ct, info->ct_events, info->exp_events,
This work adds the possibility of deriving the zone id from the skb->mark field in a scalable manner. This allows for having only a single template serving 100s .. 1000s of different zones, for example, instead of needing to have one match for each zone as an extra CT jump target. Note that we'd need to have this information attached to the template as at the time when we're trying to lookup a possible ct object, we already need to know zone information for a possible match when going into __nf_conntrack_find_get(). This work provides a minimal implementation for a possible mapping. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> --- include/net/netfilter/nf_conntrack_zones.h | 14 ++++++++++++-- include/uapi/linux/netfilter/nf_conntrack_common.h | 4 ++++ include/uapi/linux/netfilter/xt_CT.h | 4 +++- net/ipv4/netfilter/nf_conntrack_proto_icmp.c | 2 +- net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c | 3 ++- net/netfilter/nf_conntrack_core.c | 4 ++-- net/netfilter/xt_CT.c | 8 ++++++-- 7 files changed, 30 insertions(+), 9 deletions(-)