Message ID | 1470339389-8542-18-git-send-email-kan.liang@intel.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 16-08-04 12:36 PM, kan.liang@intel.com wrote: > From: Kan Liang <kan.liang@intel.com> > > To achieve better network performance, the key step is to distribute the > packets to dedicated queues according to policy and system run time > status. > > This patch provides an interface which can return the proper dedicated > queue for socket/task. Then the packets of the socket/task will be > redirect to the dedicated queue for better network performance. > > For selecting the proper queue, currently it uses round-robin algorithm > to find the available object from the given policy object list. The > algorithm is good enough for now. But it could be improved by some > adaptive algorithm later. > > The selected object will be stored in hashtable. So it does not need to > go through the whole object list every time. > > Signed-off-by: Kan Liang <kan.liang@intel.com> > --- > include/linux/netpolicy.h | 5 ++ > net/core/netpolicy.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 141 insertions(+) > There is a hook in the tx path now (recently added) # ifdef CONFIG_NET_EGRESS if (static_key_false(&egress_needed)) { skb = sch_handle_egress(skb, &rc, dev); if (!skb) goto out; } # endif that allows pushing any policy you like for picking tx queues. It would be better to use this mechanism. The hook runs 'tc' classifiers so either write a new ./net/sch/cls_*.c for this or just use ebpf to stick your policy in at runtime. I'm out of the office for a few days but when I get pack I can test that it actually picks the selected queue in all cases I know there was an issue with some of the drivers using select_queue awhile back. .John
On 08/04/2016 10:21 PM, John Fastabend wrote: > On 16-08-04 12:36 PM, kan.liang@intel.com wrote: >> From: Kan Liang <kan.liang@intel.com> >> >> To achieve better network performance, the key step is to distribute the >> packets to dedicated queues according to policy and system run time >> status. >> >> This patch provides an interface which can return the proper dedicated >> queue for socket/task. Then the packets of the socket/task will be >> redirect to the dedicated queue for better network performance. >> >> For selecting the proper queue, currently it uses round-robin algorithm >> to find the available object from the given policy object list. The >> algorithm is good enough for now. But it could be improved by some >> adaptive algorithm later. >> >> The selected object will be stored in hashtable. So it does not need to >> go through the whole object list every time. >> >> Signed-off-by: Kan Liang <kan.liang@intel.com> >> --- >> include/linux/netpolicy.h | 5 ++ >> net/core/netpolicy.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 141 insertions(+) > > There is a hook in the tx path now (recently added) > > # ifdef CONFIG_NET_EGRESS > if (static_key_false(&egress_needed)) { > skb = sch_handle_egress(skb, &rc, dev); > if (!skb) > goto out; > } > # endif > > that allows pushing any policy you like for picking tx queues. It would > be better to use this mechanism. The hook runs 'tc' classifiers so > either write a new ./net/sch/cls_*.c for this or just use ebpf to stick > your policy in at runtime. > > I'm out of the office for a few days but when I get pack I can test that > it actually picks the selected queue in all cases I know there was an > issue with some of the drivers using select_queue awhile back. +1, I tried to bring this up here [1] in the last spin. I think only very few changes would be needed, f.e. on eBPF side to add a queue setting helper function which is probably straight forward ~10loc patch; and with regards to actually picking it up after clsact egress, we'd need to adapt __netdev_pick_tx() slightly when CONFIG_XPS so it doesn't override it. [1] http://www.spinics.net/lists/netdev/msg386953.html
> +1, I tried to bring this up here [1] in the last spin. I think only very > few changes would be needed, f.e. on eBPF side to add a queue setting > helper function which is probably straight forward ~10loc patch; and with > regards to actually picking it up after clsact egress, we'd need to adapt > __netdev_pick_tx() slightly when CONFIG_XPS so it doesn't override it. You're proposing to rewrite the whole net policy manager as EBPF and run it in a crappy JITer? Is that a serious proposal? It just sounds crazy to me. Especially since we already have a perfectly good compiler and programming language to write system code in. EBPF is ok for temporal instrumentation (if you somehow can accept its security challenges), but using it to replace core kernel functionality (which network policy IMHO is) with some bizarre JITed setup and multiple languages doesn't really make any sense. Especially it doesn't make sense for anything with shared state, which is the core part of network policy: it negotiates with multiple users. After all we're writing Linux here and not some research toy. Thanks, -Andi
On 08/05/2016 12:54 AM, Andi Kleen wrote: >> +1, I tried to bring this up here [1] in the last spin. I think only very >> few changes would be needed, f.e. on eBPF side to add a queue setting >> helper function which is probably straight forward ~10loc patch; and with >> regards to actually picking it up after clsact egress, we'd need to adapt >> __netdev_pick_tx() slightly when CONFIG_XPS so it doesn't override it. > > You're proposing to rewrite the whole net policy manager as EBPF and run > it in a crappy JITer? Is that a serious proposal? It just sounds crazy > to me. > > Especially since we already have a perfectly good compiler and > programming language to write system code in. > > EBPF is ok for temporal instrumentation (if you somehow can accept > its security challenges), but using it to replace core > kernel functionality (which network policy IMHO is) with some bizarre > JITed setup and multiple languages doesn't really make any sense. > > Especially it doesn't make sense for anything with shared state, > which is the core part of network policy: it negotiates with multiple > users. > > After all we're writing Linux here and not some research toy. From what I read I guess you didn't really bother to look any deeper into this bizarre "research toy" to double check some of your claims. One of the things it's often deployed for by the way is defining policy. And the suggestion here was merely to explore existing infrastructure around things like tc and whether it already resolves at least a part of your net policy manager's requirements (like queue selection) or whether existing infrastructure can be extended with fewer complexity this way (as was mentioned with a new cls module as one option).
On Thu, Aug 4, 2016 at 12:36 PM, <kan.liang@intel.com> wrote: > From: Kan Liang <kan.liang@intel.com> > > To achieve better network performance, the key step is to distribute the > packets to dedicated queues according to policy and system run time > status. > > This patch provides an interface which can return the proper dedicated > queue for socket/task. Then the packets of the socket/task will be > redirect to the dedicated queue for better network performance. > > For selecting the proper queue, currently it uses round-robin algorithm > to find the available object from the given policy object list. The > algorithm is good enough for now. But it could be improved by some > adaptive algorithm later. > Seriously? You want to all of this code so we revert to TX queue selection by round robin? > The selected object will be stored in hashtable. So it does not need to > go through the whole object list every time. > > Signed-off-by: Kan Liang <kan.liang@intel.com> > --- > include/linux/netpolicy.h | 5 ++ > net/core/netpolicy.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 141 insertions(+) > > diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h > index 5900252..a522015 100644 > --- a/include/linux/netpolicy.h > +++ b/include/linux/netpolicy.h > @@ -97,6 +97,7 @@ extern void update_netpolicy_sys_map(void); > extern int netpolicy_register(struct netpolicy_instance *instance, > enum netpolicy_name policy); > extern void netpolicy_unregister(struct netpolicy_instance *instance); > +extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx); > #else > static inline void update_netpolicy_sys_map(void) > { > @@ -111,6 +112,10 @@ static inline void netpolicy_unregister(struct netpolicy_instance *instance) > { > } > > +static inline int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx) > +{ > + return 0; > +} > #endif > > #endif /*__LINUX_NETPOLICY_H*/ > diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c > index 3605761..98ca430 100644 > --- a/net/core/netpolicy.c > +++ b/net/core/netpolicy.c > @@ -290,6 +290,142 @@ static void netpolicy_record_clear_dev_node(struct net_device *dev) > spin_unlock_bh(&np_hashtable_lock); > } > > +static struct netpolicy_object *get_avail_object(struct net_device *dev, > + enum netpolicy_name policy, > + bool is_rx) > +{ > + int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX; > + struct netpolicy_object *tmp, *obj = NULL; > + int val = -1; > + > + /* Check if net policy is supported */ > + if (!dev || !dev->netpolicy) > + return NULL; > + > + /* The system should have queues which support the request policy. */ > + if ((policy != dev->netpolicy->cur_policy) && > + (dev->netpolicy->cur_policy != NET_POLICY_MIX)) > + return NULL; > + > + spin_lock_bh(&dev->np_ob_list_lock); > + list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], list) { > + if ((val > atomic_read(&tmp->refcnt)) || > + (val == -1)) { > + val = atomic_read(&tmp->refcnt); > + obj = tmp; > + } > + } > + > + if (WARN_ON(!obj)) { > + spin_unlock_bh(&dev->np_ob_list_lock); > + return NULL; > + } > + atomic_inc(&obj->refcnt); > + spin_unlock_bh(&dev->np_ob_list_lock); > + > + return obj; > +} > + > +static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx) > +{ > + struct netpolicy_record *old_record, *new_record; > + struct net_device *dev = instance->dev; > + unsigned long ptr_id = (uintptr_t)instance->ptr; > + int queue = -1; > + > + spin_lock_bh(&np_hashtable_lock); > + old_record = netpolicy_record_search(ptr_id); > + if (!old_record) { > + pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n"); > + instance->policy = NET_POLICY_INVALID; > + goto err; > + } > + > + if (is_rx && old_record->rx_obj) { > + queue = old_record->rx_obj->queue; > + } else if (!is_rx && old_record->tx_obj) { > + queue = old_record->tx_obj->queue; > + } else { > + new_record = kzalloc(sizeof(*new_record), GFP_KERNEL); > + if (!new_record) > + goto err; > + memcpy(new_record, old_record, sizeof(*new_record)); > + > + if (is_rx) { > + new_record->rx_obj = get_avail_object(dev, new_record->policy, is_rx); > + if (!new_record->dev) > + new_record->dev = dev; > + if (!new_record->rx_obj) { > + kfree(new_record); > + goto err; > + } > + queue = new_record->rx_obj->queue; > + } else { > + new_record->tx_obj = get_avail_object(dev, new_record->policy, is_rx); > + if (!new_record->dev) > + new_record->dev = dev; > + if (!new_record->tx_obj) { > + kfree(new_record); > + goto err; > + } > + queue = new_record->tx_obj->queue; > + } > + /* update record */ > + hlist_replace_rcu(&old_record->hash_node, &new_record->hash_node); > + kfree(old_record); > + } > +err: > + spin_unlock_bh(&np_hashtable_lock); > + return queue; > +} > + > +static inline bool policy_validate(struct netpolicy_instance *instance) > +{ > + struct net_device *dev = instance->dev; > + enum netpolicy_name cur_policy; > + > + cur_policy = dev->netpolicy->cur_policy; > + if ((instance->policy == NET_POLICY_NONE) || > + (cur_policy == NET_POLICY_NONE)) > + return false; > + > + if (((cur_policy != NET_POLICY_MIX) && (cur_policy != instance->policy)) || > + ((cur_policy == NET_POLICY_MIX) && (instance->policy == NET_POLICY_CPU))) { > + pr_warn("NETPOLICY: %s current device policy %s doesn't support required policy %s! Remove net policy settings!\n", > + dev->name, policy_name[cur_policy], > + policy_name[instance->policy]); > + return false; > + } > + return true; > +} > + > +/** > + * netpolicy_pick_queue() - Find proper queue > + * @instance: NET policy per socket/task instance info > + * @is_rx: RX queue or TX queue > + * > + * This function intends to find the proper queue according to policy. > + * For selecting the proper queue, currently it uses round-robin algorithm > + * to find the available object from the given policy object list. > + * The selected object will be stored in hashtable. So it does not need to > + * go through the whole object list every time. > + * > + * Return: negative on failure, otherwise on the assigned queue > + */ > +int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx) > +{ > + struct net_device *dev = instance->dev; > + > + if (!dev || !dev->netpolicy) > + return -EINVAL; > + > + if (!policy_validate(instance)) > + return -EINVAL; > + > + return get_avail_queue(instance, is_rx); > +} > +EXPORT_SYMBOL(netpolicy_pick_queue); > + > /** > * netpolicy_register() - Register per socket/task policy request > * @instance: NET policy per socket/task instance info > -- > 2.5.5 >
> > On Thu, Aug 4, 2016 at 12:36 PM, <kan.liang@intel.com> wrote: > > From: Kan Liang <kan.liang@intel.com> > > > > To achieve better network performance, the key step is to distribute > > the packets to dedicated queues according to policy and system run > > time status. > > > > This patch provides an interface which can return the proper dedicated > > queue for socket/task. Then the packets of the socket/task will be > > redirect to the dedicated queue for better network performance. > > > > For selecting the proper queue, currently it uses round-robin > > algorithm to find the available object from the given policy object > > list. The algorithm is good enough for now. But it could be improved > > by some adaptive algorithm later. > > > Seriously? You want to all of this code so we revert to TX queue selection by > round robin? > I agree that the round robin is not an optimal algorithm. For this series, we intends to provide a generic infrastructure for review. For the algorithm parts, it's already in our TODO list. We will replace it later. Thanks, Kan
On Fri, Aug 5, 2016 at 6:55 AM, Liang, Kan <kan.liang@intel.com> wrote: > > >> >> On Thu, Aug 4, 2016 at 12:36 PM, <kan.liang@intel.com> wrote: >> > From: Kan Liang <kan.liang@intel.com> >> > >> > To achieve better network performance, the key step is to distribute >> > the packets to dedicated queues according to policy and system run >> > time status. >> > >> > This patch provides an interface which can return the proper dedicated >> > queue for socket/task. Then the packets of the socket/task will be >> > redirect to the dedicated queue for better network performance. >> > >> > For selecting the proper queue, currently it uses round-robin >> > algorithm to find the available object from the given policy object >> > list. The algorithm is good enough for now. But it could be improved >> > by some adaptive algorithm later. >> > >> Seriously? You want to all of this code so we revert to TX queue selection by >> round robin? >> > > I agree that the round robin is not an optimal algorithm. > For this series, we intends to provide a generic infrastructure for review. > For the algorithm parts, it's already in our TODO list. We will replace it later. > Kan, The justification for this patch is that to achieve to network performance to steer TX distribute the packets to according to policy and system runtime status. But the patch doesn't remotely implement that, there's no data provided that these do anything useful or ever will do anything useful, and it seems like this is completely ignoring existing mechanisms like XPS that have proven to improve performance. Tom
On Thu, Aug 4, 2016 at 5:17 PM, Daniel Borkmann <daniel@iogearbox.net> wrote: > On 08/05/2016 12:54 AM, Andi Kleen wrote: >>> >>> +1, I tried to bring this up here [1] in the last spin. I think only very >>> few changes would be needed, f.e. on eBPF side to add a queue setting >>> helper function which is probably straight forward ~10loc patch; and with >>> regards to actually picking it up after clsact egress, we'd need to adapt >>> __netdev_pick_tx() slightly when CONFIG_XPS so it doesn't override it. >> >> >> You're proposing to rewrite the whole net policy manager as EBPF and run >> it in a crappy JITer? Is that a serious proposal? It just sounds crazy >> to me. >> >> Especially since we already have a perfectly good compiler and >> programming language to write system code in. >> >> EBPF is ok for temporal instrumentation (if you somehow can accept >> its security challenges), but using it to replace core >> kernel functionality (which network policy IMHO is) with some bizarre >> JITed setup and multiple languages doesn't really make any sense. >> >> Especially it doesn't make sense for anything with shared state, >> which is the core part of network policy: it negotiates with multiple >> users. >> >> After all we're writing Linux here and not some research toy. > > > From what I read I guess you didn't really bother to look any deeper into > this bizarre "research toy" to double check some of your claims. One of the > things it's often deployed for by the way is defining policy. And the > suggestion here was merely to explore existing infrastructure around things > like tc and whether it already resolves at least a part of your net policy > manager's requirements (like queue selection) or whether existing > infrastructure > can be extended with fewer complexity this way (as was mentioned with a new > cls module as one option). +1. The SO_REUSEPORT + BPF patches have already demonstrated the value of making policy in the kernel programmable. There's no reason to believe that model can't be extended to cover packet steering in the data path for TX or RX as well as other cases. Tom
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h index 5900252..a522015 100644 --- a/include/linux/netpolicy.h +++ b/include/linux/netpolicy.h @@ -97,6 +97,7 @@ extern void update_netpolicy_sys_map(void); extern int netpolicy_register(struct netpolicy_instance *instance, enum netpolicy_name policy); extern void netpolicy_unregister(struct netpolicy_instance *instance); +extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx); #else static inline void update_netpolicy_sys_map(void) { @@ -111,6 +112,10 @@ static inline void netpolicy_unregister(struct netpolicy_instance *instance) { } +static inline int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx) +{ + return 0; +} #endif #endif /*__LINUX_NETPOLICY_H*/ diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c index 3605761..98ca430 100644 --- a/net/core/netpolicy.c +++ b/net/core/netpolicy.c @@ -290,6 +290,142 @@ static void netpolicy_record_clear_dev_node(struct net_device *dev) spin_unlock_bh(&np_hashtable_lock); } +static struct netpolicy_object *get_avail_object(struct net_device *dev, + enum netpolicy_name policy, + bool is_rx) +{ + int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX; + struct netpolicy_object *tmp, *obj = NULL; + int val = -1; + + /* Check if net policy is supported */ + if (!dev || !dev->netpolicy) + return NULL; + + /* The system should have queues which support the request policy. */ + if ((policy != dev->netpolicy->cur_policy) && + (dev->netpolicy->cur_policy != NET_POLICY_MIX)) + return NULL; + + spin_lock_bh(&dev->np_ob_list_lock); + list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], list) { + if ((val > atomic_read(&tmp->refcnt)) || + (val == -1)) { + val = atomic_read(&tmp->refcnt); + obj = tmp; + } + } + + if (WARN_ON(!obj)) { + spin_unlock_bh(&dev->np_ob_list_lock); + return NULL; + } + atomic_inc(&obj->refcnt); + spin_unlock_bh(&dev->np_ob_list_lock); + + return obj; +} + +static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx) +{ + struct netpolicy_record *old_record, *new_record; + struct net_device *dev = instance->dev; + unsigned long ptr_id = (uintptr_t)instance->ptr; + int queue = -1; + + spin_lock_bh(&np_hashtable_lock); + old_record = netpolicy_record_search(ptr_id); + if (!old_record) { + pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n"); + instance->policy = NET_POLICY_INVALID; + goto err; + } + + if (is_rx && old_record->rx_obj) { + queue = old_record->rx_obj->queue; + } else if (!is_rx && old_record->tx_obj) { + queue = old_record->tx_obj->queue; + } else { + new_record = kzalloc(sizeof(*new_record), GFP_KERNEL); + if (!new_record) + goto err; + memcpy(new_record, old_record, sizeof(*new_record)); + + if (is_rx) { + new_record->rx_obj = get_avail_object(dev, new_record->policy, is_rx); + if (!new_record->dev) + new_record->dev = dev; + if (!new_record->rx_obj) { + kfree(new_record); + goto err; + } + queue = new_record->rx_obj->queue; + } else { + new_record->tx_obj = get_avail_object(dev, new_record->policy, is_rx); + if (!new_record->dev) + new_record->dev = dev; + if (!new_record->tx_obj) { + kfree(new_record); + goto err; + } + queue = new_record->tx_obj->queue; + } + /* update record */ + hlist_replace_rcu(&old_record->hash_node, &new_record->hash_node); + kfree(old_record); + } +err: + spin_unlock_bh(&np_hashtable_lock); + return queue; +} + +static inline bool policy_validate(struct netpolicy_instance *instance) +{ + struct net_device *dev = instance->dev; + enum netpolicy_name cur_policy; + + cur_policy = dev->netpolicy->cur_policy; + if ((instance->policy == NET_POLICY_NONE) || + (cur_policy == NET_POLICY_NONE)) + return false; + + if (((cur_policy != NET_POLICY_MIX) && (cur_policy != instance->policy)) || + ((cur_policy == NET_POLICY_MIX) && (instance->policy == NET_POLICY_CPU))) { + pr_warn("NETPOLICY: %s current device policy %s doesn't support required policy %s! Remove net policy settings!\n", + dev->name, policy_name[cur_policy], + policy_name[instance->policy]); + return false; + } + return true; +} + +/** + * netpolicy_pick_queue() - Find proper queue + * @instance: NET policy per socket/task instance info + * @is_rx: RX queue or TX queue + * + * This function intends to find the proper queue according to policy. + * For selecting the proper queue, currently it uses round-robin algorithm + * to find the available object from the given policy object list. + * The selected object will be stored in hashtable. So it does not need to + * go through the whole object list every time. + * + * Return: negative on failure, otherwise on the assigned queue + */ +int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx) +{ + struct net_device *dev = instance->dev; + + if (!dev || !dev->netpolicy) + return -EINVAL; + + if (!policy_validate(instance)) + return -EINVAL; + + return get_avail_queue(instance, is_rx); +} +EXPORT_SYMBOL(netpolicy_pick_queue); + /** * netpolicy_register() - Register per socket/task policy request * @instance: NET policy per socket/task instance info