diff mbox

[v2] tcg: Remove stack protection from helper functions

Message ID 4E805923.50806@siemens.com
State New
Headers show

Commit Message

Jan Kiszka Sept. 26, 2011, 10:51 a.m. UTC
This increases the overhead of frequently executed helpers. We need to
move rule past QEMU_CFLAGS assignment to ensure that the required simple
assignment picks up all bits. The signal workaround is moved just for
the sake of consistency.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Changes in v2:
 - unbreak qemu-user build

Maybe some real make guru has a nicer solution for removing the switch.

 Makefile.target |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

Comments

Peter Maydell Sept. 26, 2011, 11:33 a.m. UTC | #1
On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This increases the overhead of frequently executed helpers. We need to
> move rule past QEMU_CFLAGS assignment to ensure that the required simple
> assignment picks up all bits. The signal workaround is moved just for
> the sake of consistency.

> +# NOTE: Must be after the last QEMU_CFLAGS assignment
> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)

Why also user-exec.o ? Why not the other source files with helpers in?
This doesn't seem very consistent. Maybe the right answer is to have
some of the offending helper functions inline instead? (Or to not
have -fstack-protector-all globally?)

-- PMM
Jan Kiszka Sept. 26, 2011, 11:43 a.m. UTC | #2
On 2011-09-26 13:33, Peter Maydell wrote:
> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> This increases the overhead of frequently executed helpers. We need to
>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>> assignment picks up all bits. The signal workaround is moved just for
>> the sake of consistency.
> 
>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
> 
> Why also user-exec.o ? 

That's a good question. It doesn't look like it's deserving this.

> Why not the other source files with helpers in?

Name them and I add them.

> This doesn't seem very consistent. Maybe the right answer is to have
> some of the offending helper functions inline instead? 

I can't imagine that this could be a short- or even mid-term answer.
Inlining is a huge work.

> (Or to not
> have -fstack-protector-all globally?)

Opt-in instead of opt-out, that might be some approach, though I bet the
out-out set still bets the opt-in crowed by some orders of magnitude.

Jan
Peter Maydell Sept. 26, 2011, 11:56 a.m. UTC | #3
On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-09-26 13:33, Peter Maydell wrote:
>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> This increases the overhead of frequently executed helpers. We need to
>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>> assignment picks up all bits. The signal workaround is moved just for
>>> the sake of consistency.
>>
>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>
>> Why also user-exec.o ?
>
> That's a good question. It doesn't look like it's deserving this.
>
>> Why not the other source files with helpers in?
>
> Name them and I add them.

target-*/*helper.c ?

But mostly I think what I'm trying to say is that this is making
a tradeoff between safety and speed, so it ought to come with a
rationale for why it is OK to remove the safety checks for these
files. Given that rationale you can identify other files that are
also safe/worthwhile to flip the flag for.

>> (Or to not
>> have -fstack-protector-all globally?)
>
> Opt-in instead of opt-out, that might be some approach, though I bet the
> out-out set still bets the opt-in crowed by some orders of magnitude.

Have you looked at whether using plain -fstack-protector for all files
(rather than the -all version) helps? Presumably we had some reason for
picking the -all version though...

-- PMM
Blue Swirl Sept. 26, 2011, 5:22 p.m. UTC | #4
On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
<peter.maydell@linaro.org> wrote:
> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-09-26 13:33, Peter Maydell wrote:
>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> This increases the overhead of frequently executed helpers. We need to
>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>> assignment picks up all bits. The signal workaround is moved just for
>>>> the sake of consistency.
>>>
>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>
>>> Why also user-exec.o ?
>>
>> That's a good question. It doesn't look like it's deserving this.
>>
>>> Why not the other source files with helpers in?
>>
>> Name them and I add them.
>
> target-*/*helper.c ?
>
> But mostly I think what I'm trying to say is that this is making
> a tradeoff between safety and speed, so it ought to come with a
> rationale for why it is OK to remove the safety checks for these
> files. Given that rationale you can identify other files that are
> also safe/worthwhile to flip the flag for.

I wouldn't remove -fstack-protector-all by default. Especially op code
interfaces with the guest.

For max performance version, I'd check if -fomit-frame-pointer and -O3
makes sense. See also this article:
https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags

>>> (Or to not
>>> have -fstack-protector-all globally?)
>>
>> Opt-in instead of opt-out, that might be some approach, though I bet the
>> out-out set still bets the opt-in crowed by some orders of magnitude.
>
> Have you looked at whether using plain -fstack-protector for all files
> (rather than the -all version) helps? Presumably we had some reason for
> picking the -all version though...
>
> -- PMM
>
Jan Kiszka Sept. 26, 2011, 5:33 p.m. UTC | #5
On 2011-09-26 19:22, Blue Swirl wrote:
> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
> <peter.maydell@linaro.org> wrote:
>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>> the sake of consistency.
>>>>
>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>
>>>> Why also user-exec.o ?
>>>
>>> That's a good question. It doesn't look like it's deserving this.
>>>
>>>> Why not the other source files with helpers in?
>>>
>>> Name them and I add them.
>>
>> target-*/*helper.c ?
>>
>> But mostly I think what I'm trying to say is that this is making
>> a tradeoff between safety and speed, so it ought to come with a
>> rationale for why it is OK to remove the safety checks for these
>> files. Given that rationale you can identify other files that are
>> also safe/worthwhile to flip the flag for.
> 
> I wouldn't remove -fstack-protector-all by default. Especially op code
> interfaces with the guest.

I'd love to have some function attribute for this, because a stack
protector for rather simple arithmetic operations or something like
helper_cli/sti are pointlessly burned cycles.

Maybe we can introduce op_helper_simple.c.

> 
> For max performance version, I'd check if -fomit-frame-pointer and -O3
> makes sense. See also this article:
> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags

We already run without frame pointers, -O3 might be worth exploring in
addition. Still, that won't take the protector overhead away.

Jan
Blue Swirl Sept. 26, 2011, 6:20 p.m. UTC | #6
On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-09-26 19:22, Blue Swirl wrote:
>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
>> <peter.maydell@linaro.org> wrote:
>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>>> the sake of consistency.
>>>>>
>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>>
>>>>> Why also user-exec.o ?
>>>>
>>>> That's a good question. It doesn't look like it's deserving this.
>>>>
>>>>> Why not the other source files with helpers in?
>>>>
>>>> Name them and I add them.
>>>
>>> target-*/*helper.c ?
>>>
>>> But mostly I think what I'm trying to say is that this is making
>>> a tradeoff between safety and speed, so it ought to come with a
>>> rationale for why it is OK to remove the safety checks for these
>>> files. Given that rationale you can identify other files that are
>>> also safe/worthwhile to flip the flag for.
>>
>> I wouldn't remove -fstack-protector-all by default. Especially op code
>> interfaces with the guest.
>
> I'd love to have some function attribute for this, because a stack
> protector for rather simple arithmetic operations or something like
> helper_cli/sti are pointlessly burned cycles.

In order to avoid burning the cycles, there is a certain kernel module
which gives almost native performance.

> Maybe we can introduce op_helper_simple.c.
>
>>
>> For max performance version, I'd check if -fomit-frame-pointer and -O3
>> makes sense. See also this article:
>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags
>
> We already run without frame pointers, -O3 might be worth exploring in
> addition. Still, that won't take the protector overhead away.

It would be interesting to have some benchmarks. I'd expect that most
of the run time is spent within generated code, the next largest item
should be the translator and any helpers should be marginal.
Jan Kiszka Sept. 26, 2011, 6:25 p.m. UTC | #7
On 2011-09-26 20:20, Blue Swirl wrote:
> On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-09-26 19:22, Blue Swirl wrote:
>>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
>>> <peter.maydell@linaro.org> wrote:
>>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>>>> the sake of consistency.
>>>>>>
>>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>>>
>>>>>> Why also user-exec.o ?
>>>>>
>>>>> That's a good question. It doesn't look like it's deserving this.
>>>>>
>>>>>> Why not the other source files with helpers in?
>>>>>
>>>>> Name them and I add them.
>>>>
>>>> target-*/*helper.c ?
>>>>
>>>> But mostly I think what I'm trying to say is that this is making
>>>> a tradeoff between safety and speed, so it ought to come with a
>>>> rationale for why it is OK to remove the safety checks for these
>>>> files. Given that rationale you can identify other files that are
>>>> also safe/worthwhile to flip the flag for.
>>>
>>> I wouldn't remove -fstack-protector-all by default. Especially op code
>>> interfaces with the guest.
>>
>> I'd love to have some function attribute for this, because a stack
>> protector for rather simple arithmetic operations or something like
>> helper_cli/sti are pointlessly burned cycles.
> 
> In order to avoid burning the cycles, there is a certain kernel module
> which gives almost native performance.

Well, even in 2011 there are cases remaining where VT-x/AMV-V is not
available to your favorite hypervisor.

> 
>> Maybe we can introduce op_helper_simple.c.
>>
>>>
>>> For max performance version, I'd check if -fomit-frame-pointer and -O3
>>> makes sense. See also this article:
>>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags
>>
>> We already run without frame pointers, -O3 might be worth exploring in
>> addition. Still, that won't take the protector overhead away.
> 
> It would be interesting to have some benchmarks. I'd expect that most
> of the run time is spent within generated code, the next largest item
> should be the translator and any helpers should be marginal.

At least we've a rather static, long-running guest, not much is
recompiled once the system has settled.

Jan
Blue Swirl Sept. 26, 2011, 6:40 p.m. UTC | #8
On Mon, Sep 26, 2011 at 6:25 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-09-26 20:20, Blue Swirl wrote:
>> On Mon, Sep 26, 2011 at 5:33 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-09-26 19:22, Blue Swirl wrote:
>>>> On Mon, Sep 26, 2011 at 11:56 AM, Peter Maydell
>>>> <peter.maydell@linaro.org> wrote:
>>>>> On 26 September 2011 12:43, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>> On 2011-09-26 13:33, Peter Maydell wrote:
>>>>>>> On 26 September 2011 11:51, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>>>>> This increases the overhead of frequently executed helpers. We need to
>>>>>>>> move rule past QEMU_CFLAGS assignment to ensure that the required simple
>>>>>>>> assignment picks up all bits. The signal workaround is moved just for
>>>>>>>> the sake of consistency.
>>>>>>>
>>>>>>>> +# NOTE: Must be after the last QEMU_CFLAGS assignment
>>>>>>>> +op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
>>>>>>>
>>>>>>> Why also user-exec.o ?
>>>>>>
>>>>>> That's a good question. It doesn't look like it's deserving this.
>>>>>>
>>>>>>> Why not the other source files with helpers in?
>>>>>>
>>>>>> Name them and I add them.
>>>>>
>>>>> target-*/*helper.c ?
>>>>>
>>>>> But mostly I think what I'm trying to say is that this is making
>>>>> a tradeoff between safety and speed, so it ought to come with a
>>>>> rationale for why it is OK to remove the safety checks for these
>>>>> files. Given that rationale you can identify other files that are
>>>>> also safe/worthwhile to flip the flag for.
>>>>
>>>> I wouldn't remove -fstack-protector-all by default. Especially op code
>>>> interfaces with the guest.
>>>
>>> I'd love to have some function attribute for this, because a stack
>>> protector for rather simple arithmetic operations or something like
>>> helper_cli/sti are pointlessly burned cycles.
>>
>> In order to avoid burning the cycles, there is a certain kernel module
>> which gives almost native performance.
>
> Well, even in 2011 there are cases remaining where VT-x/AMV-V is not
> available to your favorite hypervisor.
>
>>
>>> Maybe we can introduce op_helper_simple.c.
>>>
>>>>
>>>> For max performance version, I'd check if -fomit-frame-pointer and -O3
>>>> makes sense. See also this article:
>>>> https://www.debian-administration.org/article/672/Optimizing_code_via_compiler_flags
>>>
>>> We already run without frame pointers, -O3 might be worth exploring in
>>> addition. Still, that won't take the protector overhead away.
>>
>> It would be interesting to have some benchmarks. I'd expect that most
>> of the run time is spent within generated code, the next largest item
>> should be the translator and any helpers should be marginal.
>
> At least we've a rather static, long-running guest, not much is
> recompiled once the system has settled.

Optimizing the translator or improving TCG optimizer could also
improve performance.

By the way, -fomit-frame-pointer is not enabled globally for some
reason, it should be the default for non-debug builds.
Peter Maydell Sept. 26, 2011, 7:08 p.m. UTC | #9
On 26 September 2011 19:20, Blue Swirl <blauwirbel@gmail.com> wrote:
> It would be interesting to have some benchmarks. I'd expect that most
> of the run time is spent within generated code, the next largest item
> should be the translator and any helpers should be marginal.

Depends a lot on the target, I suspect. On ARM at the moment
we spend huge amounts of time (30%+ of runtime) in a handful
of helpers like sub_cc because we haven't optimised the handling
of subtract and test to do things inline. (On my todo list but
performance in general is behind feature work. I have a half a
patchset that I haven't got back to yet.)

Which is kind of echoing your remarks about benchmarks, really.

-- PMM
diff mbox

Patch

diff --git a/Makefile.target b/Makefile.target
index 88d2f1f..b545161 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -89,14 +89,6 @@  translate-all.o: translate-all.c cpu.h
 
 tcg/tcg.o: cpu.h
 
-# HELPER_CFLAGS is used for all the code compiled with static register
-# variables
-op_helper.o user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
-
-# Note: this is a workaround. The real fix is to avoid compiling
-# cpu_signal_handler() in user-exec.c.
-signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
-
 #########################################################
 # Linux user emulator target
 
@@ -387,6 +379,15 @@  obj-y += $(addprefix ../, $(trace-obj-y))
 
 endif # CONFIG_SOFTMMU
 
+# HELPER_CFLAGS is used for all the code compiled with static register
+# variables
+# NOTE: Must be after the last QEMU_CFLAGS assignment
+op_helper.o user-exec.o: QEMU_CFLAGS := $(subst -fstack-protector-all,,$(QEMU_CFLAGS)) $(HELPER_CFLAGS)
+
+# Note: this is a workaround. The real fix is to avoid compiling
+# cpu_signal_handler() in user-exec.c.
+signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
+
 ifndef CONFIG_LINUX_USER
 ifndef CONFIG_BSD_USER
 # libcacard needs qemu-thread support, and besides is only needed by devices