diff mbox

[v2,1/3] compiler: define QEMU_CACHELINE_SIZE

Message ID 1496702979-26132-2-git-send-email-cota@braap.org
State New
Headers show

Commit Message

Emilio Cota June 5, 2017, 10:49 p.m. UTC
This is a constant used as a hint for padding structs to hopefully avoid
false cache line sharing.

The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
via --extra-cflags. If not set there, we try to obtain the value from
the machine running the configure script. If we fail, we default to
reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.

Note: the configure script only picks up the cache line size when run
on Linux hosts because I have no other platforms (e.g. Windows, BSD's)
to test on.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 configure               | 38 ++++++++++++++++++++++++++++++++++++++
 include/qemu/compiler.h | 17 +++++++++++++++++
 2 files changed, 55 insertions(+)

Comments

Pranith Kumar June 6, 2017, 5:39 a.m. UTC | #1
On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
> This is a constant used as a hint for padding structs to hopefully avoid
> false cache line sharing.
>
> The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
> via --extra-cflags. If not set there, we try to obtain the value from
> the machine running the configure script. If we fail, we default to
> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.
>
> Note: the configure script only picks up the cache line size when run
> on Linux hosts because I have no other platforms (e.g. Windows, BSD's)
> to test on.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  configure               | 38 ++++++++++++++++++++++++++++++++++++++
>  include/qemu/compiler.h | 17 +++++++++++++++++
>  2 files changed, 55 insertions(+)
>
> diff --git a/configure b/configure
> index 13e040d..6a68cb2 100755
> --- a/configure
> +++ b/configure
> @@ -4832,6 +4832,41 @@ EOF
>    fi
>  fi
>
> +# Find out the size of a cache line on the host
> +# TODO: support more platforms
> +cat > $TMPC<<EOF
> +#ifdef __linux__
> +
> +#include <stdio.h>
> +
> +#define SYSFS "/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size"
> +
> +int main(int argc, char *argv[])
> +{
> +    unsigned int size;
> +    FILE *fp;
> +
> +    fp = fopen(SYSFS, "r");
> +    if (fp == NULL) {
> +        return -1;
> +    }
> +    if (!fscanf(fp, "%u", &size)) {
> +        return -1;
> +    }
> +    return size;
> +}
> +#else
> +#error Cannot find host cache line size
> +#endif
> +EOF

Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?

Thanks,
Richard Henderson June 6, 2017, 8:18 a.m. UTC | #2
On 06/05/2017 10:39 PM, Pranith Kumar wrote:
> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?

That's an excellent idea.  In fact... see reply to 3/3.


r~
Emilio Cota June 6, 2017, 4:11 p.m. UTC | #3
On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote:
> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
> > This is a constant used as a hint for padding structs to hopefully avoid
> > false cache line sharing.
> >
> > The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
> > via --extra-cflags. If not set there, we try to obtain the value from
> > the machine running the configure script. If we fail, we default to
> > reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.
(snip)
> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?

I tried using sysconf, but it doesn't work on the PowerPC machine I have
access to (it returns 0). It might be a machine-specific thing though-I
don't know. Here's the machine's `uname -a':
  Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri Mar \
    3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux

		E.
Richard Henderson June 6, 2017, 5:39 p.m. UTC | #4
On 06/06/2017 09:11 AM, Emilio G. Cota wrote:
> On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote:
>> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
>>> This is a constant used as a hint for padding structs to hopefully avoid
>>> false cache line sharing.
>>>
>>> The constant can be set at configure time by defining QEMU_CACHELINE_SIZE
>>> via --extra-cflags. If not set there, we try to obtain the value from
>>> the machine running the configure script. If we fail, we default to
>>> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others.
> (snip)
>> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?
> 
> I tried using sysconf, but it doesn't work on the PowerPC machine I have
> access to (it returns 0). It might be a machine-specific thing though-I
> don't know. Here's the machine's `uname -a':
>    Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri Mar \
>      3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux

Well that's unfortunate.

Doing some digging, the kernel has provided the info to userland via elf auxv 
data since the beginning of time (aka initial git repository build), but glibc 
still does not export that information properly for ppc.

For ppc, you can get what we want from qemu_getauxval(AT_ICACHEBSIZE).  Indeed, 
we already have 4 different system dependent methods for determining the icache 
size in tcg/ppc/tcg-target.inc.c.

So what I think we ought to do is create a new util/cachesize.c like so:

unsigned qemu_icache_linesize = 64;
unsigned qemu_dcache_linesize = 64;

static void init_icache_data(void)
{
#ifdef _SC_LEVEL1_ICACHE_LINESIZE
     {
         long x = sysconf(_SC_LEVEL1_ICACHE_LINESIZE);
         if (x > 0) {
             qemu_icache_linesize = x;
             return;
         }
     }
#endif
#ifdef AT_ICACHEBSIZE
     {
         unsigned long x = qemu_getauxval(AT_ICACHEBSIZE);
         if (x > 0) {
             qemu_icache_linesize = x;
             return;
         }
     }
#endif
     // Other system specific methods.
}

static void init_dcache_data(void)
{
     // Similarly.
}

static void __attribute__((constructor)) init_cache_data(void)
{
     init_icache_data();
     init_dcache_data();
}

In particular, I think you want to be padding to the icache linesize rather 
than the dcache linesize since what we're attempting is to avoid writable data 
in the icache.


r~
Geert Martin Ijewski June 6, 2017, 8:28 p.m. UTC | #5
Am 06.06.2017 um 19:39 schrieb Richard Henderson:
> On 06/06/2017 09:11 AM, Emilio G. Cota wrote:
>> On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote:
>>> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote:
>>>> This is a constant used as a hint for padding structs to hopefully 
>>>> avoid
>>>> false cache line sharing.
>>>>
>>>> The constant can be set at configure time by defining 
>>>> QEMU_CACHELINE_SIZE
>>>> via --extra-cflags. If not set there, we try to obtain the value from
>>>> the machine running the configure script. If we fail, we default to
>>>> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all 
>>>> others.
>> (snip)
>>> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?
>>
>> I tried using sysconf, but it doesn't work on the PowerPC machine I have
>> access to (it returns 0). It might be a machine-specific thing though-I
>> don't know. Here's the machine's `uname -a':
>>    Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri 
>> Mar \
>>      3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux
> 
> Well that's unfortunate.
> 
> Doing some digging, the kernel has provided the info to userland via elf 
> auxv data since the beginning of time (aka initial git repository 
> build), but glibc still does not export that information properly for ppc.
> 
> For ppc, you can get what we want from qemu_getauxval(AT_ICACHEBSIZE).  
> Indeed, we already have 4 different system dependent methods for 
> determining the icache size in tcg/ppc/tcg-target.inc.c.
> 
> So what I think we ought to do is create a new util/cachesize.c like so:
> 
> unsigned qemu_icache_linesize = 64;
> unsigned qemu_dcache_linesize = 64;
> 
> static void init_icache_data(void)
> {
> #ifdef _SC_LEVEL1_ICACHE_LINESIZE
>      {
>          long x = sysconf(_SC_LEVEL1_ICACHE_LINESIZE);
>          if (x > 0) {
>              qemu_icache_linesize = x;
>              return;
>          }
>      }
> #endif
> #ifdef AT_ICACHEBSIZE
>      {
>          unsigned long x = qemu_getauxval(AT_ICACHEBSIZE);
>          if (x > 0) {
>              qemu_icache_linesize = x;
>              return;
>          }
>      }
> #endif
>      // Other system specific methods.

On a fully patched Windows 10 with an i5-4690 this code works for me (TM):

#ifdef _WIN32
     {
         DWORD bufferSize = 0;
         if (!GetLogicalProcessorInformation(0, &bufferSize) &&
                 GetLastError() == ERROR_INSUFFICIENT_BUFFER)
         {
             PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buffer =
 
(PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(bufferSize);
             if (GetLogicalProcessorInformation(buffer, &bufferSize)) {
                 size_t i = 0,
                     numOfProcessors =
                         bufferSize /
                         sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
                 for (; i < numOfProcessors; i++) {
                     if (buffer[i].Relationship == RelationCache &&
                         buffer[i].Cache.Level == 1 &&
                         (  buffer[i].Cache.Type == CacheUnified ||
                            buffer[i].Cache.Type == CacheInstruction)
                         )
                     {
                         qemu_icache_linesize = buffer[i].Cache.LineSize;
                         break;
                     }
                 }
             }
             g_free(buffer);
         }
     }
#endif

I don't particularly like that stair of ifs style, so I guess if I were 
to do a proper patch this should become a function.
> }
> 
> static void init_dcache_data(void)
> {
>      // Similarly.

The code from above, just s/CacheInstruction/CacheData/ and 
s/qemu_icache/qemu_dcache/
> }
> 
> static void __attribute__((constructor)) init_cache_data(void)
> {
>      init_icache_data();
>      init_dcache_data();
> }
> 
> In particular, I think you want to be padding to the icache linesize 
> rather than the dcache linesize since what we're attempting is to avoid 
> writable data in the icache.
> 
> 
> r~
> 
> 

To quote from the documentation:
"RelationCache: [... snip ...]
Windows Server 2003:  This value is not supported until Windows Server 
2003 with SP1 and Windows XP Professional x64 Edition." --
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686694(v=vs.85).aspx

I'm not sure if that is considered a problem, as both systems aren't 
supported anymore for almost 2 years now.

Geert
Emilio Cota June 6, 2017, 9:38 p.m. UTC | #6
On Tue, Jun 06, 2017 at 22:28:23 +0200, Geert Martin Ijewski wrote:
> On a fully patched Windows 10 with an i5-4690 this code works for me (TM):

Thanks!
Can you please test this?

		Emilio
---
#include "qemu/osdep.h"
#include <windows.h>

static unsigned int linesize_win(PROCESSOR_CACHE_TYPE type)
{
    PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buf;
    DWORD size = 0;
    unsigned int ret = 0;
    BOOL success;
    size_t n;
    size_t i;

    success = GetLogicalProcessorInformation(0, &size);
    if (success || GetLastError() != ERROR_INSUFFICIENT_BUF) {
        return 0;
    }
    buf = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(size);
    if (!GetLogicalProcessorInformation(buf, &size)) {
        goto out;
    }

    n = size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
    for (i = 0; i < n; i++) {
        if (buf[i].Relationship == RelationCache &&
            buf[i].Cache.Level == 1 &&
            (buf[i].Cache.Type == CacheUnified ||
             buf[i].Cache.Type == type)) {
            ret = buf[i].Cache.LineSize;
            break;
        }
    }
 out:
    g_free(buf);
    return ret;
}

linesize_win(CacheInstruction);
linesize_win(CacheData);
Geert Martin Ijewski June 6, 2017, 10:01 p.m. UTC | #7
Am 06.06.2017 um 23:38 schrieb Emilio G. Cota:
 > On Tue, Jun 06, 2017 at 22:28:23 +0200, Geert Martin Ijewski wrote:
 >> On a fully patched Windows 10 with an i5-4690 this code works for me 
(TM):
 >
 > Thanks!
 > Can you please test this?
 >
 > 		Emilio
 > ---
 > #include "qemu/osdep.h"
 > #include <windows.h>

unnecassary as it's already included by qemu/osdep.h -> sysemu/os-win32.h
 >
 > static unsigned int linesize_win(PROCESSOR_CACHE_TYPE type)
 > {
 >      PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buf;
 >      DWORD size = 0;
 >      unsigned int ret = 0;
 >      BOOL success;
 >      size_t n;
 >      size_t i;
 >
 >      success = GetLogicalProcessorInformation(0, &size);
 >      if (success || GetLastError() != ERROR_INSUFFICIENT_BUF) {
 >          return 0;
 >      }
 >      buf = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(size);
 >      if (!GetLogicalProcessorInformation(buf, &size)) {
 >          goto out;
 >      }
 >
 >      n = size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
 >      for (i = 0; i < n; i++) {
 >          if (buf[i].Relationship == RelationCache &&
 >              buf[i].Cache.Level == 1 &&
 >              (buf[i].Cache.Type == CacheUnified ||
 >               buf[i].Cache.Type == type)) {
 >              ret = buf[i].Cache.LineSize;
 >              break;
 >          }
 >      }
 >   out:
 >      g_free(buf);
 >      return ret;
 > }
 >
 > linesize_win(CacheInstruction);
 > linesize_win(CacheData);
 >
 >

Yes, that works.
Tested-by: Geert Martin Ijewski <gm.ijewski@web.de>
diff mbox

Patch

diff --git a/configure b/configure
index 13e040d..6a68cb2 100755
--- a/configure
+++ b/configure
@@ -4832,6 +4832,41 @@  EOF
   fi
 fi
 
+# Find out the size of a cache line on the host
+# TODO: support more platforms
+cat > $TMPC<<EOF
+#ifdef __linux__
+
+#include <stdio.h>
+
+#define SYSFS "/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size"
+
+int main(int argc, char *argv[])
+{
+    unsigned int size;
+    FILE *fp;
+
+    fp = fopen(SYSFS, "r");
+    if (fp == NULL) {
+        return -1;
+    }
+    if (!fscanf(fp, "%u", &size)) {
+        return -1;
+    }
+    return size;
+}
+#else
+#error Cannot find host cache line size
+#endif
+EOF
+
+host_cacheline_size=0
+if compile_prog "" "" ; then
+    ./$TMPE
+    host_cacheline_size=$?
+fi
+
+
 ##########################################
 # check for _Static_assert()
 
@@ -5284,6 +5319,9 @@  fi
 if test "$bigendian" = "yes" ; then
   echo "HOST_WORDS_BIGENDIAN=y" >> $config_host_mak
 fi
+if test "$host_cacheline_size" -gt 0 ; then
+    echo "HOST_CACHELINE_SIZE=$host_cacheline_size" >> $config_host_mak
+fi
 if test "$mingw32" = "yes" ; then
   echo "CONFIG_WIN32=y" >> $config_host_mak
   rc_version=$(cat $source_path/VERSION)
diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 340e5fd..178d831 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -40,6 +40,23 @@ 
 # define QEMU_PACKED __attribute__((packed))
 #endif
 
+/*
+ * Cache line size of the host. Can be overriden.
+ * Note that this is just a compile-time hint to hopefully avoid false sharing
+ * of cache lines; code must be correct regardless of the constant's value.
+ */
+#ifndef QEMU_CACHELINE_SIZE
+# ifdef HOST_CACHELINE_SIZE
+#  define QEMU_CACHELINE_SIZE HOST_CACHELINE_SIZE
+# else
+#  if defined(__powerpc64__)
+#   define QEMU_CACHELINE_SIZE 128
+#  else
+#   define QEMU_CACHELINE_SIZE 64
+#  endif
+# endif
+#endif
+
 #define QEMU_ALIGNED(X) __attribute__((aligned(X)))
 
 #ifndef glue