Message ID | 20190409093046.13401-1-zajec5@gmail.com |
---|---|
State | RFC |
Delegated to: | Rafał Miłecki |
Headers | show |
Series | [OpenWrt-Devel,RFC] kernel: drop -fno-reorder-blocks | expand |
On 09.04.2019 11:30, Rafał Miłecki wrote: > 1) bcm53xx: BCM47094 SoC (echo 2 > rps_cpus) > > zImage size: 1840424 → 1871328 (+1,68%) > > a) gro off > LAN to WAN: 824 Mb/s → 940 Mb/s (+14,08%) > WAN to LAN: 935 Mb/s → 940 Mb/s (+0,53%) > > b) gro on > LAN to WAN: 512 Mb/s → 534 Mb/s (+4,30%) > WAN to LAN: 539 Mb/s → 549 Mb/s (+1,85%) I was obviously curious why this change affects bcm53xx. I tried using perf to profile kernel before & after the change. I'm not sure about results interpretation. One thing I noticed is lowered CPU usage for __softirqentry_text_start. Could that be it? You can see FlameGraph-s at files.zajec.net/openwrt/fno-reorder-blocks/ P.S. I used FlameGraph's difffolded.pl to compare LAN to WAN perfs before and after the change. It seems to highlight the same thing: __softirqentry_text_start. I'm still unsure what does it mean and if the same improvement can be achieved any other way. ********** LAN to WAN 1) Before the patch (824 Mb/s): + 9,61% swapper [kernel.kallsyms] [k] v7_dma_inv_range + 6,22% swapper [kernel.kallsyms] [k] __softirqentry_text_start + 5,14% swapper [kernel.kallsyms] [k] l2c210_inv_range + 4,88% ksoftirqd/1 [kernel.kallsyms] [k] v7_dma_clean_range + 3,93% swapper [kernel.kallsyms] [k] bcma_host_soc_read32 + 3,43% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 3,01% swapper [kernel.kallsyms] [k] arch_cpu_idle + 2,81% ksoftirqd/1 [kernel.kallsyms] [k] l2c210_clean_range + 2,15% ksoftirqd/1 [kernel.kallsyms] [k] bgmac_start_xmit + 2,02% swapper [kernel.kallsyms] [k] bgmac_poll + 1,90% ksoftirqd/1 [kernel.kallsyms] [k] __dev_queue_xmit + 1,73% ksoftirqd/1 [kernel.kallsyms] [k] nf_hook_slow + 1,34% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1,05% ksoftirqd/1 [kernel.kallsyms] [k] skb_pull_rcsum 2) After the patch (940 Mb/s): + 11,07% swapper [kernel.kallsyms] [k] v7_dma_inv_range + 5,76% swapper [kernel.kallsyms] [k] __softirqentry_text_start + 5,72% ksoftirqd/1 [kernel.kallsyms] [k] v7_dma_clean_range + 5,37% swapper [kernel.kallsyms] [k] l2c210_inv_range + 4,34% swapper [kernel.kallsyms] [k] bcma_host_soc_read32 + 3,65% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 3,18% ksoftirqd/1 [kernel.kallsyms] [k] l2c210_clean_range + 2,71% swapper [kernel.kallsyms] [k] bgmac_poll + 2,59% swapper [kernel.kallsyms] [k] arch_cpu_idle + 1,97% ksoftirqd/1 [kernel.kallsyms] [k] bgmac_start_xmit + 1,67% ksoftirqd/1 [kernel.kallsyms] [k] __dev_queue_xmit + 1,54% ksoftirqd/1 [kernel.kallsyms] [k] nf_hook_slow + 1,16% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1,08% ksoftirqd/1 [kernel.kallsyms] [k] skb_pull_rcsum + 1,07% ksoftirqd/1 [kernel.kallsyms] [k] netif_skb_features + 1,04% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip ********** WAN to LAN 1) Before the patch (935 Mb/s): + 10,55% swapper [kernel.kallsyms] [k] v7_dma_inv_range + 6,01% swapper [kernel.kallsyms] [k] __softirqentry_text_start + 5,56% swapper [kernel.kallsyms] [k] l2c210_inv_range + 5,55% ksoftirqd/1 [kernel.kallsyms] [k] v7_dma_clean_range + 4,36% swapper [kernel.kallsyms] [k] bcma_host_soc_read32 + 2,70% ksoftirqd/1 [kernel.kallsyms] [k] l2c210_clean_range + 2,65% swapper [kernel.kallsyms] [k] arch_cpu_idle + 2,43% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 2,34% ksoftirqd/1 [kernel.kallsyms] [k] __dev_queue_xmit + 2,19% swapper [kernel.kallsyms] [k] bgmac_poll + 2,08% ksoftirqd/1 [kernel.kallsyms] [k] bgmac_start_xmit + 1,73% ksoftirqd/1 [kernel.kallsyms] [k] nf_hook_slow + 1,52% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1,45% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1,13% ksoftirqd/1 [kernel.kallsyms] [k] skb_pull_rcsum + 1,11% ksoftirqd/1 [kernel.kallsyms] [k] ip_finish_output2 + 1,06% ksoftirqd/1 [kernel.kallsyms] [k] netif_skb_features 2) After the patch (940 Mb/s): + 11,73% swapper [kernel.kallsyms] [k] v7_dma_inv_range + 6,05% ksoftirqd/1 [kernel.kallsyms] [k] v7_dma_clean_range + 5,94% swapper [kernel.kallsyms] [k] l2c210_inv_range + 4,79% swapper [kernel.kallsyms] [k] __softirqentry_text_start + 4,08% swapper [kernel.kallsyms] [k] bcma_host_soc_read32 + 3,05% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 2,98% ksoftirqd/1 [kernel.kallsyms] [k] l2c210_clean_range + 2,53% swapper [kernel.kallsyms] [k] bgmac_poll + 2,36% ksoftirqd/1 [kernel.kallsyms] [k] __dev_queue_xmit + 2,15% ksoftirqd/1 [kernel.kallsyms] [k] bgmac_start_xmit + 2,10% swapper [kernel.kallsyms] [k] arch_cpu_idle + 1,64% ksoftirqd/1 [kernel.kallsyms] [k] nf_hook_slow + 1,33% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1,28% ksoftirqd/1 [kernel.kallsyms] [k] netif_skb_features + 1,27% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1,02% swapper [kernel.kallsyms] [k] __skb_flow_dissect
diff --git a/target/linux/generic/pending-4.14/201-extra_optimization.patch b/target/linux/generic/pending-4.14/201-extra_optimization.patch index c7790657fd..3f7613d3dd 100644 --- a/target/linux/generic/pending-4.14/201-extra_optimization.patch +++ b/target/linux/generic/pending-4.14/201-extra_optimization.patch @@ -26,7 +26,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name> +KBUILD_CFLAGS += -O2 $(call cc-disable-warning,maybe-uninitialized,) $(EXTRA_OPTIMIZATION) else -KBUILD_CFLAGS += -O2 -+KBUILD_CFLAGS += -O2 -fno-reorder-blocks -fno-tree-ch $(EXTRA_OPTIMIZATION) ++KBUILD_CFLAGS += -O2 -fno-tree-ch $(EXTRA_OPTIMIZATION) endif endif diff --git a/target/linux/generic/pending-4.19/201-extra_optimization.patch b/target/linux/generic/pending-4.19/201-extra_optimization.patch index d86e29fc75..f002c49676 100644 --- a/target/linux/generic/pending-4.19/201-extra_optimization.patch +++ b/target/linux/generic/pending-4.19/201-extra_optimization.patch @@ -26,7 +26,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name> +KBUILD_CFLAGS += -O2 $(call cc-disable-warning,maybe-uninitialized,) $(EXTRA_OPTIMIZATION) else -KBUILD_CFLAGS += -O2 -+KBUILD_CFLAGS += -O2 -fno-reorder-blocks -fno-tree-ch $(EXTRA_OPTIMIZATION) ++KBUILD_CFLAGS += -O2 -fno-tree-ch $(EXTRA_OPTIMIZATION) endif endif