diff mbox series

um: borrow bitops from the x86 tree

Message ID 20201116144426.8415-1-anton.ivanov@cambridgegreys.com
State Superseded
Headers show
Series um: borrow bitops from the x86 tree | expand

Commit Message

Anton Ivanov Nov. 16, 2020, 2:44 p.m. UTC
From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Using x86 bitops instead of the asm-generic allows to squeeze
a couple of percents improvement on fs IO in UML. It should
improve other areas as well.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 arch/um/include/asm/bitops-x86.h |  1 +
 arch/um/include/asm/bitops.h     | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)
 create mode 120000 arch/um/include/asm/bitops-x86.h
 create mode 100644 arch/um/include/asm/bitops.h

Comments

Johannes Berg Nov. 17, 2020, 11:05 a.m. UTC | #1
Hi Anton,

So I thought I'd test your performance patches here, and applied
(hopefully the latest versions of) on top of 5.9:

      um: allow the use of glibc functions instead of builtins
      um: Fetch registers only for signals which need them
      um: enable the use of optimized xor routines in UML
      um: add a UML specific futex implementation
      um: Remove use of asprinf in umid.c
      um: "borrow" atomics from x86 architecture
      um: "borrow" cmpxchg from x86 tree in UML
      um: borrow bitops from the x86 tree


With the patches (compiled with glibc functions), one of my trivial
virtual lab tests gets:

  Time (mean ± σ):     15.918 s ±  0.833 s    [User: 10.977 s, System: 5.600 s]
  Range (min … max):   15.371 s … 17.986 s    10 runs

It's not a large improvement, it seems noticable; without the patches I
get:

  Time (mean ± σ):     16.525 s ±  0.884 s    [User: 11.355 s, System: 5.648 s]
  Range (min … max):   15.682 s … 18.088 s    10 runs

johannes
Anton Ivanov Nov. 17, 2020, 11:46 a.m. UTC | #2
On 17/11/2020 11:05, Johannes Berg wrote:
> Hi Anton,
> 
> So I thought I'd test your performance patches here, and applied
> (hopefully the latest versions of) on top of 5.9:
> 
>        um: allow the use of glibc functions instead of builtins
>        um: Fetch registers only for signals which need them
>        um: enable the use of optimized xor routines in UML
>        um: add a UML specific futex implementation
>        um: Remove use of asprinf in umid.c
>        um: "borrow" atomics from x86 architecture
>        um: "borrow" cmpxchg from x86 tree in UML
>        um: borrow bitops from the x86 tree
> 
> 
> With the patches (compiled with glibc functions), one of my trivial
> virtual lab tests gets:
> 
>    Time (mean ± σ):     15.918 s ±  0.833 s    [User: 10.977 s, System: 5.600 s]
>    Range (min … max):   15.371 s … 17.986 s    10 runs
> 
> It's not a large improvement, it seems noticable; without the patches I
> get:
> 
>    Time (mean ± σ):     16.525 s ±  0.884 s    [User: 11.355 s, System: 5.648 s]
>    Range (min … max):   15.682 s … 18.088 s    10 runs
> 
> johannes
> 
> 

This is similar to what I get.

My usual test is:

  time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \;

I discard the first run and use only runs from fs cache.

With stock I get

real	34.0 - 36.0
user 	29.6 - 29.9
sys	3.4 - 3.6


With the patch-set I get

real	32.0 - 34.0
user	28.2 - 29.2
sys	3.0 - 3.4

dd if=/dev/zero of=/dev/null bs=1M on the whole UBD device without the patches for 2nd run and later is 2.0GB/s - 2.1GB/s, with the patches is 2.2GB/s - 2.3GB/s

It is not a lot, but something - 2-5% on average depending on actual test.

The real gain will be to figure out how to optimize the memory mapper. It is the "handbrake" which slows down everything else.
Johannes Berg Nov. 17, 2020, 12:11 p.m. UTC | #3
On Tue, 2020-11-17 at 11:46 +0000, Anton Ivanov wrote:
> 
> My usual test is:
> 
>   time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \;
> 
> I discard the first run and use only runs from fs cache.

Oh. I didn't even run the timing inside. I ran it *outside*, something
like

time ./linux args... init=/path/to/test-script.sh

johannes
Anton Ivanov Nov. 17, 2020, 12:53 p.m. UTC | #4
On 17/11/2020 12:11, Johannes Berg wrote:
> On Tue, 2020-11-17 at 11:46 +0000, Anton Ivanov wrote:
>>
>> My usual test is:
>>
>>    time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \;
>>
>> I discard the first run and use only runs from fs cache.
> 
> Oh. I didn't even run the timing inside. I ran it *outside*, something
> like
> 
> time ./linux args... init=/path/to/test-script.sh

I usually do a full set of tests on fs access, device IO access and a 
netperf after each patch.

Based on them it looks like it is worth it.

The more interesting question is - is this the right organization?

We have stuff in multiple places now - arch/x86/um , arch/um, etc.

IMHO, we should probably look at getting it organized so that all 
sub-arches are under the um tree at some point.

> 
> johannes
> 
> 
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
>
Anton Ivanov Dec. 7, 2020, 6:18 p.m. UTC | #5
On 17/11/2020 12:53, Anton Ivanov wrote:
> On 17/11/2020 12:11, Johannes Berg wrote:
>> On Tue, 2020-11-17 at 11:46 +0000, Anton Ivanov wrote:
>>>
>>> My usual test is:
>>>
>>>    time busybox find /usr/lib/ -type f -exec cat {} > /dev/null \;
>>>
>>> I discard the first run and use only runs from fs cache.
>>
>> Oh. I didn't even run the timing inside. I ran it *outside*, something
>> like
>>
>> time ./linux args... init=/path/to/test-script.sh
> 
> I usually do a full set of tests on fs access, device IO access and a 
> netperf after each patch.
> 
> Based on them it looks like it is worth it.
> 
> The more interesting question is - is this the right organization?
> 
> We have stuff in multiple places now - arch/x86/um , arch/um, etc.
> 
> IMHO, we should probably look at getting it organized so that all 
> sub-arches are under the um tree at some point.


In the meantime, a backport of these patchsets (string, atomic, bitops, 
xor, futex, etc) to OpenWRT/UML has clocked 14 days as my main CPE.

I have not observed any stability issues and there is some visible 
improvement in CPU usage.


> 
>>
>> johannes
>>
>>
>> _______________________________________________
>> linux-um mailing list
>> linux-um@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-um
>>
> 
>
diff mbox series

Patch

diff --git a/arch/um/include/asm/bitops-x86.h b/arch/um/include/asm/bitops-x86.h
new file mode 120000
index 000000000000..15a96ff554b2
--- /dev/null
+++ b/arch/um/include/asm/bitops-x86.h
@@ -0,0 +1 @@ 
+../../../x86/include/asm/bitops.h
\ No newline at end of file
diff --git a/arch/um/include/asm/bitops.h b/arch/um/include/asm/bitops.h
new file mode 100644
index 000000000000..e578c628a6d5
--- /dev/null
+++ b/arch/um/include/asm/bitops.h
@@ -0,0 +1,20 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_UM_BITOPS_H
+#define _ASM_UM_BITOPS_H
+
+#ifdef CONFIG_64BIT
+
+#undef CONFIG_X86_32
+
+#ifndef CONFIG_X86_64
+#define CONFIG_X86_64
+#endif
+
+#else
+#define CONFIG_X86_32
+#endif
+
+#include <asm/bitops-x86.h>
+
+
+#endif