Message ID | 4E805923.50806@siemens.com |
---|---|
State | New |
Headers | show |
On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: > This increases the overhead of frequently executed helpers. We need to > move rule past QEMU_CFLAGS assignment to ensure that the required simple > assignment picks up all bits. The signal workaround is moved just for > the sake of consistency. > +# NOTE: Must be after the last QEMU_CFLAGS assignment > +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) Why also user-exec.o ? Why not the other source files with helpers in? This doesn't seem very consistent. Maybe the right answer is to have some of the offending helper functions inline instead? (Or to not have -fstack-protector-all globally?) -- PMM
On 2011-09-26 13:33, Peter Maydell wrote: > On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> This increases the overhead of frequently executed helpers. We need to >> move rule past QEMU_CFLAGS assignment to ensure that the required simple >> assignment picks up all bits. The signal workaround is moved just for >> the sake of consistency. > >> +# NOTE: Must be after the last QEMU_CFLAGS assignment >> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) > > Why also user-exec.o ? That's a good question. It doesn't look like it's deserving this. > Why not the other source files with helpers in? Name them and I add them. > This doesn't seem very consistent. Maybe the right answer is to have > some of the offending helper functions inline instead? I can't imagine that this could be a short- or even mid-term answer. Inlining is a huge work. > (Or to not > have -fstack-protector-all globally?) Opt-in instead of opt-out, that might be some approach, though I bet the out-out set still bets the opt-in crowed by some orders of magnitude. Jan
On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2011-09-26 13:33, Peter Maydell wrote: >> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> This increases the overhead of frequently executed helpers. We need to >>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>> assignment picks up all bits. The signal workaround is moved just for >>> the sake of consistency. >> >>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >> >> Why also user-exec.o ? > > That's a good question. It doesn't look like it's deserving this. > >> Why not the other source files with helpers in? > > Name them and I add them. target-*/*helper.c ? But mostly I think what I'm trying to say is that this is making a tradeoff between safety and speed, so it ought to come with a rationale for why it is OK to remove the safety checks for these files. Given that rationale you can identify other files that are also safe/worthwhile to flip the flag for. >> (Or to not >> have -fstack-protector-all globally?) > > Opt-in instead of opt-out, that might be some approach, though I bet the > out-out set still bets the opt-in crowed by some orders of magnitude. Have you looked at whether using plain -fstack-protector for all files (rather than the -all version) helps? Presumably we had some reason for picking the -all version though... -- PMM
On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2011-09-26 13:33, Peter Maydell wrote: >>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>> This increases the overhead of frequently executed helpers. We need to >>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>> assignment picks up all bits. The signal workaround is moved just for >>>> the sake of consistency. >>> >>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>> >>> Why also user-exec.o ? >> >> That's a good question. It doesn't look like it's deserving this. >> >>> Why not the other source files with helpers in? >> >> Name them and I add them. > > target-*/*helper.c ? > > But mostly I think what I'm trying to say is that this is making > a tradeoff between safety and speed, so it ought to come with a > rationale for why it is OK to remove the safety checks for these > files. Given that rationale you can identify other files that are > also safe/worthwhile to flip the flag for. I wouldn't remove -fstack-protector-all by default. Especially op code interfaces with the guest. For max performance version, I'd check if -fomit-frame-pointer and -O3 makes sense. See also this article: https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags >>> (Or to not >>> have -fstack-protector-all globally?) >> >> Opt-in instead of opt-out, that might be some approach, though I bet the >> out-out set still bets the opt-in crowed by some orders of magnitude. > > Have you looked at whether using plain -fstack-protector for all files > (rather than the -all version) helps? Presumably we had some reason for > picking the -all version though... > > -- PMM >
On 2011-09-26 19:22, Blue Swirl wrote: > On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell > <peter.maydell@linaro.org> wrote: >> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> On 2011-09-26 13:33, Peter Maydell wrote: >>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>> This increases the overhead of frequently executed helpers. We need to >>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>> assignment picks up all bits. The signal workaround is moved just for >>>>> the sake of consistency. >>>> >>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>> >>>> Why also user-exec.o ? >>> >>> That's a good question. It doesn't look like it's deserving this. >>> >>>> Why not the other source files with helpers in? >>> >>> Name them and I add them. >> >> target-*/*helper.c ? >> >> But mostly I think what I'm trying to say is that this is making >> a tradeoff between safety and speed, so it ought to come with a >> rationale for why it is OK to remove the safety checks for these >> files. Given that rationale you can identify other files that are >> also safe/worthwhile to flip the flag for. > > I wouldn't remove -fstack-protector-all by default. Especially op code > interfaces with the guest. I'd love to have some function attribute for this, because a stack protector for rather simple arithmetic operations or something like helper_cli/sti are pointlessly burned cycles. Maybe we can introduce op_helper_simple.c. > > For max performance version, I'd check if -fomit-frame-pointer and -O3 > makes sense. See also this article: > https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags We already run without frame pointers, -O3 might be worth exploring in addition. Still, that won't take the protector overhead away. Jan
On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2011-09-26 19:22, Blue Swirl wrote: >> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell >> <peter.maydell@linaro.org> wrote: >>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>> On 2011-09-26 13:33, Peter Maydell wrote: >>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>> This increases the overhead of frequently executed helpers. We need to >>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>>> assignment picks up all bits. The signal workaround is moved just for >>>>>> the sake of consistency. >>>>> >>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>>> >>>>> Why also user-exec.o ? >>>> >>>> That's a good question. It doesn't look like it's deserving this. >>>> >>>>> Why not the other source files with helpers in? >>>> >>>> Name them and I add them. >>> >>> target-*/*helper.c ? >>> >>> But mostly I think what I'm trying to say is that this is making >>> a tradeoff between safety and speed, so it ought to come with a >>> rationale for why it is OK to remove the safety checks for these >>> files. Given that rationale you can identify other files that are >>> also safe/worthwhile to flip the flag for. >> >> I wouldn't remove -fstack-protector-all by default. Especially op code >> interfaces with the guest. > > I'd love to have some function attribute for this, because a stack > protector for rather simple arithmetic operations or something like > helper_cli/sti are pointlessly burned cycles. In order to avoid burning the cycles, there is a certain kernel module which gives almost native performance. > Maybe we can introduce op_helper_simple.c. > >> >> For max performance version, I'd check if -fomit-frame-pointer and -O3 >> makes sense. See also this article: >> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags > > We already run without frame pointers, -O3 might be worth exploring in > addition. Still, that won't take the protector overhead away. It would be interesting to have some benchmarks. I'd expect that most of the run time is spent within generated code, the next largest item should be the translator and any helpers should be marginal.
On 2011-09-26 20:20, Blue Swirl wrote: > On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2011-09-26 19:22, Blue Swirl wrote: >>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell >>> <peter.maydell@linaro.org> wrote: >>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>> On 2011-09-26 13:33, Peter Maydell wrote: >>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>>> This increases the overhead of frequently executed helpers. We need to >>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>>>> assignment picks up all bits. The signal workaround is moved just for >>>>>>> the sake of consistency. >>>>>> >>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>>>> >>>>>> Why also user-exec.o ? >>>>> >>>>> That's a good question. It doesn't look like it's deserving this. >>>>> >>>>>> Why not the other source files with helpers in? >>>>> >>>>> Name them and I add them. >>>> >>>> target-*/*helper.c ? >>>> >>>> But mostly I think what I'm trying to say is that this is making >>>> a tradeoff between safety and speed, so it ought to come with a >>>> rationale for why it is OK to remove the safety checks for these >>>> files. Given that rationale you can identify other files that are >>>> also safe/worthwhile to flip the flag for. >>> >>> I wouldn't remove -fstack-protector-all by default. Especially op code >>> interfaces with the guest. >> >> I'd love to have some function attribute for this, because a stack >> protector for rather simple arithmetic operations or something like >> helper_cli/sti are pointlessly burned cycles. > > In order to avoid burning the cycles, there is a certain kernel module > which gives almost native performance. Well, even in 2011 there are cases remaining where VT-x/AMV-V is not available to your favorite hypervisor. > >> Maybe we can introduce op_helper_simple.c. >> >>> >>> For max performance version, I'd check if -fomit-frame-pointer and -O3 >>> makes sense. See also this article: >>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags >> >> We already run without frame pointers, -O3 might be worth exploring in >> addition. Still, that won't take the protector overhead away. > > It would be interesting to have some benchmarks. I'd expect that most > of the run time is spent within generated code, the next largest item > should be the translator and any helpers should be marginal. At least we've a rather static, long-running guest, not much is recompiled once the system has settled. Jan
On Mon, Sep 26, 2011 at 6:25 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2011-09-26 20:20, Blue Swirl wrote: >> On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> On 2011-09-26 19:22, Blue Swirl wrote: >>>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell >>>> <peter.maydell@linaro.org> wrote: >>>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>> On 2011-09-26 13:33, Peter Maydell wrote: >>>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>>>>>>> This increases the overhead of frequently executed helpers. We need to >>>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple >>>>>>>> assignment picks up all bits. The signal workaround is moved just for >>>>>>>> the sake of consistency. >>>>>>> >>>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment >>>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) >>>>>>> >>>>>>> Why also user-exec.o ? >>>>>> >>>>>> That's a good question. It doesn't look like it's deserving this. >>>>>> >>>>>>> Why not the other source files with helpers in? >>>>>> >>>>>> Name them and I add them. >>>>> >>>>> target-*/*helper.c ? >>>>> >>>>> But mostly I think what I'm trying to say is that this is making >>>>> a tradeoff between safety and speed, so it ought to come with a >>>>> rationale for why it is OK to remove the safety checks for these >>>>> files. Given that rationale you can identify other files that are >>>>> also safe/worthwhile to flip the flag for. >>>> >>>> I wouldn't remove -fstack-protector-all by default. Especially op code >>>> interfaces with the guest. >>> >>> I'd love to have some function attribute for this, because a stack >>> protector for rather simple arithmetic operations or something like >>> helper_cli/sti are pointlessly burned cycles. >> >> In order to avoid burning the cycles, there is a certain kernel module >> which gives almost native performance. > > Well, even in 2011 there are cases remaining where VT-x/AMV-V is not > available to your favorite hypervisor. > >> >>> Maybe we can introduce op_helper_simple.c. >>> >>>> >>>> For max performance version, I'd check if -fomit-frame-pointer and -O3 >>>> makes sense. See also this article: >>>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags >>> >>> We already run without frame pointers, -O3 might be worth exploring in >>> addition. Still, that won't take the protector overhead away. >> >> It would be interesting to have some benchmarks. I'd expect that most >> of the run time is spent within generated code, the next largest item >> should be the translator and any helpers should be marginal. > > At least we've a rather static, long-running guest, not much is > recompiled once the system has settled. Optimizing the translator or improving TCG optimizer could also improve performance. By the way, -fomit-frame-pointer is not enabled globally for some reason, it should be the default for non-debug builds.
On 26 September 2011 19:20, Blue Swirl <blauwirbel@gmail.com> wrote: > It would be interesting to have some benchmarks. I'd expect that most > of the run time is spent within generated code, the next largest item > should be the translator and any helpers should be marginal. Depends a lot on the target, I suspect. On ARM at the moment we spend huge amounts of time (30%+ of runtime) in a handful of helpers like sub_cc because we haven't optimised the handling of subtract and test to do things inline. (On my todo list but performance in general is behind feature work. I have a half a patchset that I haven't got back to yet.) Which is kind of echoing your remarks about benchmarks, really. -- PMM
diff --git a/Makefile.target b/Makefile.target index 88d2f1f..b545161 100644 --- a/Makefile.target +++ b/Makefile.target @@ -89,14 +89,6 @@ translate-all.o: translate-all.c cpu.h tcg/tcg.o: cpu.h -# HELPER_CFLAGS is used for all the code compiled with static register -# variables -op_helper.o user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS) - -# Note: this is a workaround. The real fix is to avoid compiling -# cpu_signal_handler() in user-exec.c. -signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS) - ######################################################### # Linux user emulator target @@ -387,6 +379,15 @@ obj-y += $(addprefix ../, $(trace-obj-y)) endif # CONFIG_SOFTMMU +# HELPER_CFLAGS is used for all the code compiled with static register +# variables +# NOTE: Must be after the last QEMU_CFLAGS assignment +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS) + +# Note: this is a workaround. The real fix is to avoid compiling +# cpu_signal_handler() in user-exec.c. +signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS) + ifndef CONFIG_LINUX_USER ifndef CONFIG_BSD_USER # libcacard needs qemu-thread support, and besides is only needed by devices
This increases the overhead of frequently executed helpers. We need to move rule past QEMU_CFLAGS assignment to ensure that the required simple assignment picks up all bits. The signal workaround is moved just for the sake of consistency. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- Changes in v2: - unbreak qemu-user build Maybe some real make guru has a nicer solution for removing the switch. Makefile.target | 17 +++++++++-------- 1 files changed, 9 insertions(+), 8 deletions(-)