Message ID | 20210517140434.59555-14-cian.ferriter@intel.com |
---|---|
State | Changes Requested |
Headers | show |
Series | DPIF Framework + Optimizations | expand |
> As a small optimization, this patch caches the result of a CPU ISA > check from DPDK. Particularly in the case of running the DPCLS > autovalidator (which repeatedly probes subtables) this reduces > the amount of CPU ISA lookups from the DPDK level. > > By caching them at the OVS/dpdk.c level, the ISA checks remain > runtime for the CPU where they are executed, but subsequent checks > for the same ISA feature become much cheaper. > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> > Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> > Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> > Thanks for the patch Cian/Harry, to my mind this does not have a dependency on the previous AVX512 patches? So could be merged separately regardless? A few comments inline below also. > --- > > v8: Add NEWS entry. > --- > NEWS | 1 + > lib/dpdk.c | 28 ++++++++++++++++++++++++---- > 2 files changed, 25 insertions(+), 4 deletions(-) > > diff --git a/NEWS b/NEWS > index 98943c404..c71273ddd 100644 > --- a/NEWS > +++ b/NEWS > @@ -21,6 +21,7 @@ Post-v2.15.0 > - DPDK: > * OVS validated with DPDK 20.11.1. It is recommended to use this version > until further releases. > + * Cache results for CPU ISA checks, reduces overhead on repeated > lookups. Not sure if a news Item is required for this, it seems to be a small optimization over the existing implementation and it's not really exposed to the user. Out of interest do you have metrics regarding the overhead of the previous approach and the improvement this approach gives? Regards Ian > > > v2.15.0 - 15 Feb 2021 > diff --git a/lib/dpdk.c b/lib/dpdk.c > index 319540394..c883a4b8b 100644 > --- a/lib/dpdk.c > +++ b/lib/dpdk.c > @@ -614,13 +614,33 @@ print_dpdk_version(void) > puts(rte_version()); > } > > +/* Avoid calling rte_cpu_get_flag_enabled() excessively, by caching the > + * result of the call for each CPU flag in a static variable. To avoid > + * allocating large numbers of static variables, use a uint8 as a bitfield. > + * Note the macro must only return if the ISA check is done and available. > + */ > +#define ISA_CHECK_DONE_BIT (1 << 0) > +#define ISA_AVAILABLE_BIT (1 << 1) > + > #define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG) \ > do { \ > if (strncmp(feature, name_str, strlen(name_str)) == 0) { \ > - int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ > - VLOG_DBG("CPU flag %s, available %s\n", name_str, \ > - has_isa ? "yes" : "no"); \ > - return true; \ > + static uint8_t isa_check_##RTE_CPUFLAG; \ > + int check = isa_check_##RTE_CPUFLAG & ISA_CHECK_DONE_BIT; \ > + if (OVS_UNLIKELY(!check)) { \ > + int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ > + VLOG_DBG("CPU flag %s, available %s\n", \ > + name_str, has_isa ? "yes" : "no"); \ > + isa_check_##RTE_CPUFLAG = ISA_CHECK_DONE_BIT; \ > + if (has_isa) { \ > + isa_check_##RTE_CPUFLAG |= ISA_AVAILABLE_BIT; \ > + } \ > + } \ > + if (isa_check_##RTE_CPUFLAG & ISA_AVAILABLE_BIT) { \ > + return true; \ > + } else { \ > + return false; \ > + } \ > } \ > } while (0) > > -- > 2.31.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi Ian, Thanks for the review. My responses are inline. > -----Original Message----- > From: Stokes, Ian <ian.stokes@intel.com> > Sent: Wednesday 9 June 2021 16:23 > To: Ferriter, Cian <cian.ferriter@intel.com>; ovs-dev@openvswitch.org; Van Haaren, Harry <harry.van.haaren@intel.com> > Cc: i.maximets@ovn.org > Subject: RE: [ovs-dev] [v12 13/16] dpdk: Cache result of CPU ISA checks. > > > As a small optimization, this patch caches the result of a CPU ISA > > check from DPDK. Particularly in the case of running the DPCLS > > autovalidator (which repeatedly probes subtables) this reduces > > the amount of CPU ISA lookups from the DPDK level. > > > > By caching them at the OVS/dpdk.c level, the ISA checks remain > > runtime for the CPU where they are executed, but subsequent checks > > for the same ISA feature become much cheaper. > > > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> > > Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> > > Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> > > > > Thanks for the patch Cian/Harry, > > to my mind this does not have a dependency on the previous AVX512 patches? So could be merged separately regardless? > Correct, this could be applied independently of the previous AVX512 patches. > A few comments inline below also. > > --- > > > > v8: Add NEWS entry. > > --- > > NEWS | 1 + > > lib/dpdk.c | 28 ++++++++++++++++++++++++---- > > 2 files changed, 25 insertions(+), 4 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index 98943c404..c71273ddd 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -21,6 +21,7 @@ Post-v2.15.0 > > - DPDK: > > * OVS validated with DPDK 20.11.1. It is recommended to use this version > > until further releases. > > + * Cache results for CPU ISA checks, reduces overhead on repeated > > lookups. > Not sure if a news Item is required for this, it seems to be a small optimization over the existing implementation and it's not really > exposed to the user. > That's fair. I'll remove this as a NEWS item in the next version. > Out of interest do you have metrics regarding the overhead of the previous approach and the improvement this approach gives? > This was done more for code cleanliness and design cleanliness. It was noticed that CPU ISA checks were taking a significant amount of cycles (over 5% of pmd processing time) when the autovalidator was being used. Since this isn't used in normal operation of OVS, this isn't on the critical path. Still, it's good not to waste cycles here. > > Regards > Ian > > > > > > v2.15.0 - 15 Feb 2021 > > diff --git a/lib/dpdk.c b/lib/dpdk.c > > index 319540394..c883a4b8b 100644 > > --- a/lib/dpdk.c > > +++ b/lib/dpdk.c > > @@ -614,13 +614,33 @@ print_dpdk_version(void) > > puts(rte_version()); > > } > > > > +/* Avoid calling rte_cpu_get_flag_enabled() excessively, by caching the > > + * result of the call for each CPU flag in a static variable. To avoid > > + * allocating large numbers of static variables, use a uint8 as a bitfield. > > + * Note the macro must only return if the ISA check is done and available. > > + */ > > +#define ISA_CHECK_DONE_BIT (1 << 0) > > +#define ISA_AVAILABLE_BIT (1 << 1) > > + > > #define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG) \ > > do { \ > > if (strncmp(feature, name_str, strlen(name_str)) == 0) { \ > > - int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ > > - VLOG_DBG("CPU flag %s, available %s\n", name_str, \ > > - has_isa ? "yes" : "no"); \ > > - return true; \ > > + static uint8_t isa_check_##RTE_CPUFLAG; \ > > + int check = isa_check_##RTE_CPUFLAG & ISA_CHECK_DONE_BIT; \ > > + if (OVS_UNLIKELY(!check)) { \ > > + int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ > > + VLOG_DBG("CPU flag %s, available %s\n", \ > > + name_str, has_isa ? "yes" : "no"); \ > > + isa_check_##RTE_CPUFLAG = ISA_CHECK_DONE_BIT; \ > > + if (has_isa) { \ > > + isa_check_##RTE_CPUFLAG |= ISA_AVAILABLE_BIT; \ > > + } \ > > + } \ > > + if (isa_check_##RTE_CPUFLAG & ISA_AVAILABLE_BIT) { \ > > + return true; \ > > + } else { \ > > + return false; \ > > + } \ > > } \ > > } while (0) > > > > -- > > 2.31.1 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> Hi Ian, > > Thanks for the review. My responses are inline. > > > -----Original Message----- > > From: Stokes, Ian <ian.stokes@intel.com> > > Sent: Wednesday 9 June 2021 16:23 > > To: Ferriter, Cian <cian.ferriter@intel.com>; ovs-dev@openvswitch.org; Van > Haaren, Harry <harry.van.haaren@intel.com> > > Cc: i.maximets@ovn.org > > Subject: RE: [ovs-dev] [v12 13/16] dpdk: Cache result of CPU ISA checks. > > > > > As a small optimization, this patch caches the result of a CPU ISA > > > check from DPDK. Particularly in the case of running the DPCLS > > > autovalidator (which repeatedly probes subtables) this reduces > > > the amount of CPU ISA lookups from the DPDK level. > > > > > > By caching them at the OVS/dpdk.c level, the ISA checks remain > > > runtime for the CPU where they are executed, but subsequent checks > > > for the same ISA feature become much cheaper. > > > > > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> > > > Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> > > > Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> > > > > > > > Thanks for the patch Cian/Harry, > > > > to my mind this does not have a dependency on the previous AVX512 patches? > So could be merged separately regardless? > > > > Correct, this could be applied independently of the previous AVX512 patches. > > > A few comments inline below also. > > > --- > > > > > > v8: Add NEWS entry. > > > --- > > > NEWS | 1 + > > > lib/dpdk.c | 28 ++++++++++++++++++++++++---- > > > 2 files changed, 25 insertions(+), 4 deletions(-) > > > > > > diff --git a/NEWS b/NEWS > > > index 98943c404..c71273ddd 100644 > > > --- a/NEWS > > > +++ b/NEWS > > > @@ -21,6 +21,7 @@ Post-v2.15.0 > > > - DPDK: > > > * OVS validated with DPDK 20.11.1. It is recommended to use this version > > > until further releases. > > > + * Cache results for CPU ISA checks, reduces overhead on repeated > > > lookups. > > Not sure if a news Item is required for this, it seems to be a small optimization > over the existing implementation and it's not really > > exposed to the user. > > > > That's fair. I'll remove this as a NEWS item in the next version. > > > Out of interest do you have metrics regarding the overhead of the previous > approach and the improvement this approach gives? > > > > This was done more for code cleanliness and design cleanliness. It was noticed > that CPU ISA checks were taking a significant amount of cycles (over 5% of pmd > processing time) when the autovalidator was being used. Since this isn't used in > normal operation of OVS, this isn't on the critical path. > > Still, it's good not to waste cycles here. Agreed, sounds like a good optimization. I'll leave it for you to decide if you want to submit to master separately, I have no issues applying as part of the series or individually. Regards Ian > > > > > Regards > > Ian > > > > > > > > > v2.15.0 - 15 Feb 2021 > > > diff --git a/lib/dpdk.c b/lib/dpdk.c > > > index 319540394..c883a4b8b 100644 > > > --- a/lib/dpdk.c > > > +++ b/lib/dpdk.c > > > @@ -614,13 +614,33 @@ print_dpdk_version(void) > > > puts(rte_version()); > > > } > > > > > > +/* Avoid calling rte_cpu_get_flag_enabled() excessively, by caching the > > > + * result of the call for each CPU flag in a static variable. To avoid > > > + * allocating large numbers of static variables, use a uint8 as a bitfield. > > > + * Note the macro must only return if the ISA check is done and available. > > > + */ > > > +#define ISA_CHECK_DONE_BIT (1 << 0) > > > +#define ISA_AVAILABLE_BIT (1 << 1) > > > + > > > #define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG) \ > > > do { \ > > > if (strncmp(feature, name_str, strlen(name_str)) == 0) { \ > > > - int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ > > > - VLOG_DBG("CPU flag %s, available %s\n", name_str, \ > > > - has_isa ? "yes" : "no"); \ > > > - return true; \ > > > + static uint8_t isa_check_##RTE_CPUFLAG; \ > > > + int check = isa_check_##RTE_CPUFLAG & ISA_CHECK_DONE_BIT; \ > > > + if (OVS_UNLIKELY(!check)) { \ > > > + int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ > > > + VLOG_DBG("CPU flag %s, available %s\n", \ > > > + name_str, has_isa ? "yes" : "no"); \ > > > + isa_check_##RTE_CPUFLAG = ISA_CHECK_DONE_BIT; \ > > > + if (has_isa) { \ > > > + isa_check_##RTE_CPUFLAG |= ISA_AVAILABLE_BIT; \ > > > + } \ > > > + } \ > > > + if (isa_check_##RTE_CPUFLAG & ISA_AVAILABLE_BIT) { \ > > > + return true; \ > > > + } else { \ > > > + return false; \ > > > + } \ > > > } \ > > > } while (0) > > > > > > -- > > > 2.31.1 > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
diff --git a/NEWS b/NEWS index 98943c404..c71273ddd 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,7 @@ Post-v2.15.0 - DPDK: * OVS validated with DPDK 20.11.1. It is recommended to use this version until further releases. + * Cache results for CPU ISA checks, reduces overhead on repeated lookups. v2.15.0 - 15 Feb 2021 diff --git a/lib/dpdk.c b/lib/dpdk.c index 319540394..c883a4b8b 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -614,13 +614,33 @@ print_dpdk_version(void) puts(rte_version()); } +/* Avoid calling rte_cpu_get_flag_enabled() excessively, by caching the + * result of the call for each CPU flag in a static variable. To avoid + * allocating large numbers of static variables, use a uint8 as a bitfield. + * Note the macro must only return if the ISA check is done and available. + */ +#define ISA_CHECK_DONE_BIT (1 << 0) +#define ISA_AVAILABLE_BIT (1 << 1) + #define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG) \ do { \ if (strncmp(feature, name_str, strlen(name_str)) == 0) { \ - int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ - VLOG_DBG("CPU flag %s, available %s\n", name_str, \ - has_isa ? "yes" : "no"); \ - return true; \ + static uint8_t isa_check_##RTE_CPUFLAG; \ + int check = isa_check_##RTE_CPUFLAG & ISA_CHECK_DONE_BIT; \ + if (OVS_UNLIKELY(!check)) { \ + int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG); \ + VLOG_DBG("CPU flag %s, available %s\n", \ + name_str, has_isa ? "yes" : "no"); \ + isa_check_##RTE_CPUFLAG = ISA_CHECK_DONE_BIT; \ + if (has_isa) { \ + isa_check_##RTE_CPUFLAG |= ISA_AVAILABLE_BIT; \ + } \ + } \ + if (isa_check_##RTE_CPUFLAG & ISA_AVAILABLE_BIT) { \ + return true; \ + } else { \ + return false; \ + } \ } \ } while (0)