Message ID | 1271669822.16881.7520.camel@edumazet-laptop |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, Apr 19, 2010 at 5:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > net_rps_action() is a bit expensive on NR_CPUS=64..4096 kernels, even if > RPS is not active. > > I add a flag to scan cpumask only if at least one IPI was scheduled. > Even cpumask_weight() might be expensive on some setups, where > nr_cpumask_bits could be very big (4096 for example) How about using a array to save the cpu IDs. The number of CPUs, to which the IPI will be sent, should be small.
Le lundi 19 avril 2010 à 17:48 +0800, Changli Gao a écrit : > On Mon, Apr 19, 2010 at 5:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > net_rps_action() is a bit expensive on NR_CPUS=64..4096 kernels, even if > > RPS is not active. > > > > I add a flag to scan cpumask only if at least one IPI was scheduled. > > Even cpumask_weight() might be expensive on some setups, where > > nr_cpumask_bits could be very big (4096 for example) > > How about using a array to save the cpu IDs. The number of CPUs, to > which the IPI will be sent, should be small. > Yes it should be small, yet the two arrays would be big enough to make softnet_data first part use at least two cache lines instead of one, even in the case we handle one cpu/IPI per net_rps_action() As several packets can be enqueued for a given cpu, we would need to keep bitmasks. We would have to add one test in enqueue_to_backlog() if (cpu_test_and_set(cpu, mask)) { __raise_softirq_irqoff(NET_RX_SOFTIRQ); array[nb++] = cpu; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 19, 2010 at 8:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > As several packets can be enqueued for a given cpu, we would need to > keep bitmasks. > We would have to add one test in enqueue_to_backlog() > > if (cpu_test_and_set(cpu, mask)) { > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > array[nb++] = cpu; > } rps_lock(queue); if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { if (queue->input_pkt_queue.qlen) { ... if (napi_schedule_prep(&queue->backlog)) { #ifdef CONFIG_RPS if (cpu != smp_processor_id()) { struct rps_remote_softirq_cpus *rcpus = &__get_cpu_var(rps_remote_softirq_cpus); cpu_set(cpu, rcpus->mask[rcpus->select]); __raise_softirq_irqoff(NET_RX_SOFTIRQ); goto enqueue; } #endif __napi_schedule(&queue->backlog); } Only the first packet of a softnet.input_pkt_queue may trigger IPI, so we don't need to keep bitmasks.
Le lundi 19 avril 2010 à 20:28 +0800, Changli Gao a écrit : > On Mon, Apr 19, 2010 at 8:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > As several packets can be enqueued for a given cpu, we would need to > > keep bitmasks. > > We would have to add one test in enqueue_to_backlog() > > > > if (cpu_test_and_set(cpu, mask)) { > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > array[nb++] = cpu; > > } > > rps_lock(queue); > if (queue->input_pkt_queue.qlen <= netdev_max_backlog) { > if (queue->input_pkt_queue.qlen) { > ... > if (napi_schedule_prep(&queue->backlog)) { > #ifdef CONFIG_RPS > if (cpu != smp_processor_id()) { > struct rps_remote_softirq_cpus *rcpus = > &__get_cpu_var(rps_remote_softirq_cpus); > > cpu_set(cpu, rcpus->mask[rcpus->select]); > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > goto enqueue; > } > #endif > __napi_schedule(&queue->backlog); > } > > Only the first packet of a softnet.input_pkt_queue may trigger IPI, so > we don't need to keep bitmasks. > This is not true Changli Please read again all previous mails about RPS, or the code. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 19 avril 2010 à 15:27 +0200, Eric Dumazet a écrit : > This is not true Changli > > Please read again all previous mails about RPS, or the code. > Hmm, I just read again, and I now remember Tom used a single bitmap, then we had to add a second set because of a possible race. A list would be enough. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 649a025..283d3ef 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1389,8 +1389,11 @@ struct softnet_data { struct list_head poll_list; struct sk_buff *completion_queue; - /* Elements below can be accessed between CPUs for RPS */ #ifdef CONFIG_RPS + unsigned int rps_ipis_scheduled; + unsigned int rps_select; + cpumask_t rps_mask[2]; + /* Elements below can be accessed between CPUs for RPS */ struct call_single_data csd ____cacheline_aligned_in_smp; unsigned int input_queue_head; #endif diff --git a/net/core/dev.c b/net/core/dev.c index 7abf959..3e6e420 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2347,19 +2347,14 @@ done: } /* - * This structure holds the per-CPU mask of CPUs for which IPIs are scheduled + * sofnet_data holds the per-CPU mask of CPUs for which IPIs are scheduled * to be sent to kick remote softirq processing. There are two masks since - * the sending of IPIs must be done with interrupts enabled. The select field + * the sending of IPIs must be done with interrupts enabled. The rps_select field * indicates the current mask that enqueue_backlog uses to schedule IPIs. * select is flipped before net_rps_action is called while still under lock, * net_rps_action then uses the non-selected mask to send the IPIs and clears * it without conflicting with enqueue_backlog operation. */ -struct rps_remote_softirq_cpus { - cpumask_t mask[2]; - int select; -}; -static DEFINE_PER_CPU(struct rps_remote_softirq_cpus, rps_remote_softirq_cpus); /* Called from hardirq (IPI) context */ static void trigger_softirq(void *data) @@ -2403,10 +2398,10 @@ enqueue: if (napi_schedule_prep(&queue->backlog)) { #ifdef CONFIG_RPS if (cpu != smp_processor_id()) { - struct rps_remote_softirq_cpus *rcpus = - &__get_cpu_var(rps_remote_softirq_cpus); + struct softnet_data *myqueue = &__get_cpu_var(softnet_data); - cpu_set(cpu, rcpus->mask[rcpus->select]); + cpu_set(cpu, myqueue->rps_mask[myqueue->rps_select]); + myqueue->rps_ipis_scheduled = 1; __raise_softirq_irqoff(NET_RX_SOFTIRQ); goto enqueue; } @@ -2911,7 +2906,9 @@ int netif_receive_skb(struct sk_buff *skb) } EXPORT_SYMBOL(netif_receive_skb); -/* Network device is going away, flush any packets still pending */ +/* Network device is going away, flush any packets still pending + * Called with irqs disabled. + */ static void flush_backlog(void *arg) { struct net_device *dev = arg; @@ -3340,24 +3337,36 @@ void netif_napi_del(struct napi_struct *napi) } EXPORT_SYMBOL(netif_napi_del); -#ifdef CONFIG_RPS /* - * net_rps_action sends any pending IPI's for rps. This is only called from - * softirq and interrupts must be enabled. + * net_rps_action sends any pending IPI's for rps. + * Note: called with local irq disabled, but exits with local irq enabled. */ -static void net_rps_action(cpumask_t *mask) +static void net_rps_action(void) { - int cpu; +#ifdef CONFIG_RPS + if (percpu_read(softnet_data.rps_ipis_scheduled)) { + struct softnet_data *queue = &__get_cpu_var(softnet_data); + int cpu, select = queue->rps_select; + cpumask_t *mask; + + queue->rps_ipis_scheduled = 0; + queue->rps_select ^= 1; - /* Send pending IPI's to kick RPS processing on remote cpus. */ - for_each_cpu_mask_nr(cpu, *mask) { - struct softnet_data *queue = &per_cpu(softnet_data, cpu); - if (cpu_online(cpu)) - __smp_call_function_single(cpu, &queue->csd, 0); - } - cpus_clear(*mask); -} + local_irq_enable(); + + mask = &queue->rps_mask[select]; + + /* Send pending IPI's to kick RPS processing on remote cpus. */ + for_each_cpu_mask_nr(cpu, *mask) { + struct softnet_data *remqueue = &per_cpu(softnet_data, cpu); + if (cpu_online(cpu)) + __smp_call_function_single(cpu, &remqueue->csd, 0); + } + cpus_clear(*mask); + } else #endif + local_irq_enable(); +} static void net_rx_action(struct softirq_action *h) { @@ -3365,10 +3374,6 @@ static void net_rx_action(struct softirq_action *h) unsigned long time_limit = jiffies + 2; int budget = netdev_budget; void *have; -#ifdef CONFIG_RPS - int select; - struct rps_remote_softirq_cpus *rcpus; -#endif local_irq_disable(); @@ -3431,17 +3436,7 @@ static void net_rx_action(struct softirq_action *h) netpoll_poll_unlock(have); } out: -#ifdef CONFIG_RPS - rcpus = &__get_cpu_var(rps_remote_softirq_cpus); - select = rcpus->select; - rcpus->select ^= 1; - - local_irq_enable(); - - net_rps_action(&rcpus->mask[select]); -#else - local_irq_enable(); -#endif + net_rps_action(); #ifdef CONFIG_NET_DMA /*
net_rps_action() is a bit expensive on NR_CPUS=64..4096 kernels, even if RPS is not active. I add a flag to scan cpumask only if at least one IPI was scheduled. Even cpumask_weight() might be expensive on some setups, where nr_cpumask_bits could be very big (4096 for example) Move all RPS logic into net_rps_action() to cleanup net_rx_action() code (remove two ifdefs) Move rps_remote_softirq_cpus into softnet_data to share its first cache line, filling an existing hole. In a future patch, we could call net_rps_action() from process_backlog() to make sure we send IPI before handling this cpu backlog. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/linux/netdevice.h | 5 +- net/core/dev.c | 73 ++++++++++++++++-------------------- 2 files changed, 38 insertions(+), 40 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html