diff mbox series

[v2] um: enable the use of optimized xor routines in UML

Message ID 20201112103337.4862-1-anton.ivanov@cambridgegreys.com
State Superseded
Headers show
Series [v2] um: enable the use of optimized xor routines in UML | expand

Commit Message

Anton Ivanov Nov. 12, 2020, 10:33 a.m. UTC
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

This patch enable the use of optimized xor routines from the x86
tree as well as supply the necessary macros for them to be used in
UML.

The macros supply several "fake" flags and definitions to allow
using the x86 files "as is".

This patchset implements only the flags needed for the optimized
strings.h, xor.h and checksum.h implementations instead of
trying to copy the entire x86 flags environment.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 arch/um/include/asm/processor-generic.h |  3 +++
 arch/um/include/asm/xor-x86.h           |  1 +
 arch/um/include/asm/xor.h               | 17 ++++++++++++-
 arch/um/include/asm/xor_32.h            |  1 +
 arch/um/include/asm/xor_64.h            |  1 +
 arch/um/include/asm/xor_avx.h           |  1 +
 arch/um/include/shared/os.h             |  1 +
 arch/um/kernel/um_arch.c                | 17 +++++++++++--
 arch/um/os-Linux/start_up.c             | 32 +++++++++++++++++++++++++
 9 files changed, 71 insertions(+), 3 deletions(-)
 create mode 120000 arch/um/include/asm/xor-x86.h
 create mode 120000 arch/um/include/asm/xor_32.h
 create mode 120000 arch/um/include/asm/xor_64.h
 create mode 120000 arch/um/include/asm/xor_avx.h

Comments

Johannes Berg Nov. 12, 2020, 11 a.m. UTC | #1
On Thu, 2020-11-12 at 10:33 +0000, anton.ivanov@cambridgegreys.com
wrote:
> 
> +++ b/arch/um/include/asm/xor.h
> @@ -1,7 +1,22 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
> -#include <asm-generic/xor.h>
> +#ifndef _ASM_UM_XOR_H
> +#define _ASM_UM_XOR_H
> +
> +#ifdef CONFIG_64BIT
> +#undef CONFIG_X86_32
> +#else
> +#define CONFIG_X86_32 "Y"
> +#endif

I thought you said it was some leftover? :)

> +	for (index = 0; index < MAX_UM_CPU_FEATURES; index++) {
> +		if (boot_cpu_data.host_features & (1 << index))
> +			seq_printf(m, " %s", host_cpu_feature_names[index]);

Confused. Now I don't see MAX_UM_CPU_FEATURES or
host_cpu_feature_names[] anywhere now?

johannes
Anton Ivanov Nov. 12, 2020, 11:05 a.m. UTC | #2
On 12/11/2020 11:00, Johannes Berg wrote:
> On Thu, 2020-11-12 at 10:33 +0000, anton.ivanov@cambridgegreys.com
> wrote:
>> +++ b/arch/um/include/asm/xor.h
>> @@ -1,7 +1,22 @@
>>   /* SPDX-License-Identifier: GPL-2.0 */
>> -#include <asm-generic/xor.h>
>> +#ifndef _ASM_UM_XOR_H
>> +#define _ASM_UM_XOR_H
>> +
>> +#ifdef CONFIG_64BIT
>> +#undef CONFIG_X86_32
>> +#else
>> +#define CONFIG_X86_32 "Y"
>> +#endif
> I thought you said it was some leftover? :)
>
>> +	for (index = 0; index < MAX_UM_CPU_FEATURES; index++) {
>> +		if (boot_cpu_data.host_features & (1 << index))
>> +			seq_printf(m, " %s", host_cpu_feature_names[index]);
> Confused. Now I don't see MAX_UM_CPU_FEATURES or
> host_cpu_feature_names[] anywhere now?


Ooops... forgot to include a file. Broken patch.


>
> johannes
>
>
Richard Weinberger Nov. 12, 2020, 11:09 a.m. UTC | #3
Anton,

----- Ursprüngliche Mail -----
> Von: "anton ivanov" <anton.ivanov@cambridgegreys.com>
> An: "linux-um" <linux-um@lists.infradead.org>
> CC: "richard" <richard@nod.at>, "anton ivanov" <anton.ivanov@cambridgegreys.com>
> Gesendet: Donnerstag, 12. November 2020 11:33:37
> Betreff: [PATCH v2] um: enable the use of optimized xor routines in UML

> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> 
> This patch enable the use of optimized xor routines from the x86
> tree as well as supply the necessary macros for them to be used in
> UML.
> 
> The macros supply several "fake" flags and definitions to allow
> using the x86 files "as is".
> 
> This patchset implements only the flags needed for the optimized
> strings.h, xor.h and checksum.h implementations instead of
> trying to copy the entire x86 flags environment.

So, the plan is using xor methods from arch/x86 and string methods
from glibc?

Thanks,
//richard
Anton Ivanov Nov. 12, 2020, 11:19 a.m. UTC | #4
On 12/11/2020 11:09, Richard Weinberger wrote:
> Anton,
>
> ----- Ursprüngliche Mail -----
>> Von: "anton ivanov" <anton.ivanov@cambridgegreys.com>
>> An: "linux-um" <linux-um@lists.infradead.org>
>> CC: "richard" <richard@nod.at>, "anton ivanov" <anton.ivanov@cambridgegreys.com>
>> Gesendet: Donnerstag, 12. November 2020 11:33:37
>> Betreff: [PATCH v2] um: enable the use of optimized xor routines in UML
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> This patch enable the use of optimized xor routines from the x86
>> tree as well as supply the necessary macros for them to be used in
>> UML.
>>
>> The macros supply several "fake" flags and definitions to allow
>> using the x86 files "as is".
>>
>> This patchset implements only the flags needed for the optimized
>> strings.h, xor.h and checksum.h implementations instead of
>> trying to copy the entire x86 flags environment.
> So, the plan is using xor methods from arch/x86 and string methods
> from glibc?

That is my proposal.

XOR uses a function table, there is no runtime patching so we can just pull it in and "cheat" a bit on the macros so that we do not have to implement the whole 32+ bytes of Intel features and bugs.

strings.h, however, contain a number of functions which use runtime patching in x86. We need to implement this which actually requires pulling all x86 features, because they are tagged in the alternatives table by "CPU feature". It is easier to pull the glibc ones instead. It also does not prevent us from implementing the patching at a later date - it is an alternative implementation.

Checksum, if we do it, is just a reorg and removal of the duplicated code. We have a lot of snippets from the x86 tree cut-n-pasted into our files under x86/um/. We should probably clean it up. It is just a cleanup though, there is no performance advantage there.

>
> Thanks,
> //richard
>
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
Anton Ivanov Nov. 12, 2020, 2:18 p.m. UTC | #5
On 12/11/2020 11:09, Richard Weinberger wrote:
> Anton,
>
> ----- Ursprüngliche Mail -----
>> Von: "anton ivanov" <anton.ivanov@cambridgegreys.com>
>> An: "linux-um" <linux-um@lists.infradead.org>
>> CC: "richard" <richard@nod.at>, "anton ivanov" <anton.ivanov@cambridgegreys.com>
>> Gesendet: Donnerstag, 12. November 2020 11:33:37
>> Betreff: [PATCH v2] um: enable the use of optimized xor routines in UML
>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>
>> This patch enable the use of optimized xor routines from the x86
>> tree as well as supply the necessary macros for them to be used in
>> UML.
>>
>> The macros supply several "fake" flags and definitions to allow
>> using the x86 files "as is".
>>
>> This patchset implements only the flags needed for the optimized
>> strings.h, xor.h and checksum.h implementations instead of
>> trying to copy the entire x86 flags environment.
> So, the plan is using xor methods from arch/x86 and string methods
> from glibc?

Atomics also need to come from arch/x86. That is another one. We fall back to generic atomics which are along the lines of:

interrupts_off

do atomic op

interrupts_on

That is very expensive especially compared to an existing proper atomic op.

We can do it only on 64 bit though, the 64 bit atomics on 32bit x86 use alternatives to switch between versions - it is the same problem as with string.h

This squeezes a few cycles here and there too. It is not particularly noticeable though. The gain is within "experimental error".

Patch will follow shortly.

>
> Thanks,
> //richard
>
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
Anton Ivanov Nov. 12, 2020, 3 p.m. UTC | #6
On 12/11/2020 14:18, Anton Ivanov wrote:
> 
> On 12/11/2020 11:09, Richard Weinberger wrote:
>> Anton,
>>
>> ----- Ursprüngliche Mail -----
>>> Von: "anton ivanov" <anton.ivanov@cambridgegreys.com>
>>> An: "linux-um" <linux-um@lists.infradead.org>
>>> CC: "richard" <richard@nod.at>, "anton ivanov" <anton.ivanov@cambridgegreys.com>
>>> Gesendet: Donnerstag, 12. November 2020 11:33:37
>>> Betreff: [PATCH v2] um: enable the use of optimized xor routines in UML
>>> From: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>>>
>>> This patch enable the use of optimized xor routines from the x86
>>> tree as well as supply the necessary macros for them to be used in
>>> UML.
>>>
>>> The macros supply several "fake" flags and definitions to allow
>>> using the x86 files "as is".
>>>
>>> This patchset implements only the flags needed for the optimized
>>> strings.h, xor.h and checksum.h implementations instead of
>>> trying to copy the entire x86 flags environment.
>> So, the plan is using xor methods from arch/x86 and string methods
>> from glibc?
> 
> Atomics also need to come from arch/x86. That is another one. We fall back to generic atomics which are along the lines of:
> 
> interrupts_off
> 
> do atomic op
> 
> interrupts_on
> 
> That is very expensive especially compared to an existing proper atomic op.
> 
> We can do it only on 64 bit though, the 64 bit atomics on 32bit x86 use alternatives to switch between versions - it is the same problem as with string.h
> 
> This squeezes a few cycles here and there too. It is not particularly noticeable though. The gain is within "experimental error".
> 
> Patch will follow shortly.

There are two more.

One is easy - barrier.h

We can just steal it from the x86 tree for the 64 bit case. Same story with alternatives apply for the 32 bit case. Patch coming up shortly.

The other one is futex. The generic one does not implement all ops causing workarounds at higher layers and is fairly expensive. How much - difficult to say as I do not know how much does it cost to glibc to workaround the ENOSYS it returns on some of the atomic ops.

In the parts which are implemented it does:

get_user - this results in:
	page_in
	uaccess_check()
attempt at an atomic op on value from get_user
put_user on the result - this results in
	page_in
	uaccess_check

While it should be
page_in
uaccess_check()
atomic_op() directly on futex target address

This one will be harder and will take some time and it will rely on having real barriers and atomics (the other patches as a prerequisite).

Based on looking at what we are picking up from asm-generic, this should be about it. We have the rest implemented already and/or the asm-generic and lib/ functions are as good as any potential replacement.

> 
>>
>> Thanks,
>> //richard
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>
diff mbox series

Patch

diff --git a/arch/um/include/asm/processor-generic.h b/arch/um/include/asm/processor-generic.h
index afd9b267cf81..b8bcddbb1898 100644
--- a/arch/um/include/asm/processor-generic.h
+++ b/arch/um/include/asm/processor-generic.h
@@ -90,6 +90,9 @@  extern void start_thread(struct pt_regs *regs, unsigned long entry,
 struct cpuinfo_um {
 	unsigned long loops_per_jiffy;
 	int ipi_pipe[2];
+	/* There is only a small set of x86 features we are interested
+	 * in for now */
+	unsigned long host_features;
 };
 
 extern struct cpuinfo_um boot_cpu_data;
diff --git a/arch/um/include/asm/xor-x86.h b/arch/um/include/asm/xor-x86.h
new file mode 120000
index 000000000000..beff7de6890d
--- /dev/null
+++ b/arch/um/include/asm/xor-x86.h
@@ -0,0 +1 @@ 
+../../../x86/include/asm/xor.h
\ No newline at end of file
diff --git a/arch/um/include/asm/xor.h b/arch/um/include/asm/xor.h
index 36b33d62a35d..3c2c67698908 100644
--- a/arch/um/include/asm/xor.h
+++ b/arch/um/include/asm/xor.h
@@ -1,7 +1,22 @@ 
 /* SPDX-License-Identifier: GPL-2.0 */
-#include <asm-generic/xor.h>
+#ifndef _ASM_UM_XOR_H
+#define _ASM_UM_XOR_H
+
+#ifdef CONFIG_64BIT
+#undef CONFIG_X86_32
+#else
+#define CONFIG_X86_32 "Y"
+#endif
+
+#include <asm/cpufeature.h>
+#include <asm/xor-x86.h>
 #include <linux/time-internal.h>
 
+#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+#undef XOR_SELECT_TEMPLATE
 /* pick an arbitrary one - measuring isn't possible with inf-cpu */
 #define XOR_SELECT_TEMPLATE(x)	\
 	(time_travel_mode == TT_MODE_INFCPU ? &xor_block_8regs : NULL)
+#endif
+
+#endif
diff --git a/arch/um/include/asm/xor_32.h b/arch/um/include/asm/xor_32.h
new file mode 120000
index 000000000000..8a0894e996d7
--- /dev/null
+++ b/arch/um/include/asm/xor_32.h
@@ -0,0 +1 @@ 
+../../../x86/include/asm/xor_32.h
\ No newline at end of file
diff --git a/arch/um/include/asm/xor_64.h b/arch/um/include/asm/xor_64.h
new file mode 120000
index 000000000000..b8d346c516bf
--- /dev/null
+++ b/arch/um/include/asm/xor_64.h
@@ -0,0 +1 @@ 
+../../../x86/include/asm/xor_64.h
\ No newline at end of file
diff --git a/arch/um/include/asm/xor_avx.h b/arch/um/include/asm/xor_avx.h
new file mode 120000
index 000000000000..370ded122095
--- /dev/null
+++ b/arch/um/include/asm/xor_avx.h
@@ -0,0 +1 @@ 
+../../../x86/include/asm/xor_avx.h
\ No newline at end of file
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index f467d28fc0b4..c2ff855af603 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -187,6 +187,7 @@  int os_poll(unsigned int n, const int *fds);
 extern void os_early_checks(void);
 extern void os_check_bugs(void);
 extern void check_host_supports_tls(int *supports_tls, int *tls_min);
+extern unsigned long check_host_cpu_features(const char **feature_names, int n);
 
 /* mem.c */
 extern int create_mem_file(unsigned long long len);
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 76b37297b7d4..b7dfc4fcc130 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -15,6 +15,7 @@ 
 #include <linux/kmsg_dump.h>
 
 #include <asm/processor.h>
+#include <asm/cpufeature.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <as-layout.h>
@@ -48,9 +49,12 @@  static void __init add_arg(char *arg)
  */
 struct cpuinfo_um boot_cpu_data = {
 	.loops_per_jiffy	= 0,
-	.ipi_pipe		= { -1, -1 }
+	.ipi_pipe		= { -1, -1 },
+	.host_features		= 0
 };
 
+EXPORT_SYMBOL(boot_cpu_data);
+
 union thread_union cpu0_irqstack
 	__section(".data..init_irqstack") =
 		{ .thread_info = INIT_THREAD_INFO(init_task) };
@@ -67,9 +71,15 @@  static int show_cpuinfo(struct seq_file *m, void *v)
 	seq_printf(m, "model name\t: UML\n");
 	seq_printf(m, "mode\t\t: skas\n");
 	seq_printf(m, "host\t\t: %s\n", host_info);
-	seq_printf(m, "bogomips\t: %lu.%02lu\n\n",
+	seq_printf(m, "bogomips\t: %lu.%02lu\n",
 		   loops_per_jiffy/(500000/HZ),
 		   (loops_per_jiffy/(5000/HZ)) % 100);
+	seq_printf(m, "flags\t\t:");
+	for (index = 0; index < MAX_UM_CPU_FEATURES; index++) {
+		if (boot_cpu_data.host_features & (1 << index))
+			seq_printf(m, " %s", host_cpu_feature_names[index]);
+	}
+	seq_printf(m, "\n\n");
 
 	return 0;
 }
@@ -275,6 +285,9 @@  int __init linux_main(int argc, char **argv)
 	/* OS sanity checks that need to happen before the kernel runs */
 	os_early_checks();
 
+	boot_cpu_data.host_features =
+		check_host_cpu_features(host_cpu_feature_names, MAX_UM_CPU_FEATURES);
+
 	brk_start = (unsigned long) sbrk(0);
 
 	/*
diff --git a/arch/um/os-Linux/start_up.c b/arch/um/os-Linux/start_up.c
index f79dc338279e..be884ed86b30 100644
--- a/arch/um/os-Linux/start_up.c
+++ b/arch/um/os-Linux/start_up.c
@@ -321,6 +321,38 @@  static void __init check_coredump_limit(void)
 		os_info("%llu\n", (unsigned long long)lim.rlim_max);
 }
 
+unsigned long  __init check_host_cpu_features(const char **feature_names, int n)
+{
+	FILE *cpuinfo;
+	char *line = NULL;
+	size_t len = 0;
+	int i;
+	bool done_parsing = false;
+	unsigned long result = 0;
+
+	cpuinfo = fopen("/proc/cpuinfo", "r");
+	if (cpuinfo == NULL) {
+		os_info("Failed to get host CPU features\n");
+	} else {
+		while ((getline(&line, &len, cpuinfo)) != -1) {
+			if (strstr(line, "flags")) {
+				for (i = 0; i < n; i++) {
+					if (strstr(line, feature_names[i])) {
+						result |= (1 << i);
+					}
+				}
+				done_parsing = true;
+			}
+			free(line);
+			line = NULL;
+			if (done_parsing)
+				break;
+		}
+		fclose(cpuinfo);
+	}
+	return result;
+}
+
 void __init os_early_checks(void)
 {
 	int pid;