diff mbox series

[RFC,19/26] cpu-exec: reset exit flag before calling cpu_exec_nocache

Message ID 20171031112644.10516.1734.stgit@pasha-VirtualBox
State New
Headers show
Series replay additions | expand

Commit Message

Pavel Dovgalyuk Oct. 31, 2017, 11:26 a.m. UTC
This patch resets icount_decr.u32.high before calling cpu_exec_nocache
when exception is pending. Exception is caused by the first instruction
in the block and it cannot be executed without resetting the flag.

Signed-off-by: Maria Klimushenkova <maria.klimushenkova@ispras.ru>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

---
 accel/tcg/cpu-exec.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Paolo Bonzini Nov. 2, 2017, 11:17 a.m. UTC | #1
On 31/10/2017 12:26, Pavel Dovgalyuk wrote:
> This patch resets icount_decr.u32.high before calling cpu_exec_nocache
> when exception is pending. Exception is caused by the first instruction
> in the block and it cannot be executed without resetting the flag.
> 
> Signed-off-by: Maria Klimushenkova <maria.klimushenkova@ispras.ru>
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> 
> ---
>  accel/tcg/cpu-exec.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 35d0240..aaa9c2d 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -500,6 +500,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>      } else if (replay_has_exception()
>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>          /* try to cause an exception pending in the log */
> +        atomic_set(&cpu->icount_decr.u16.high, 0);
>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
>          *ret = -1;
>          return true;
> 

I am not sure about this.  I think if instead you should return false 
from here and EXCP_INTERRUPT from cpu_exec.

More important: there is still a race, because high can be set to -1 
right after your atomic_set.  Maybe:

1) you should only return true if cpu->exception_index was set by 
cpu_exec_nocache?

2) you should not do

    *ret = -1;
    return true;

and instead do

    if (cpu->exception_index < 0 &&
        replay_has_exception() &&
        cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
           /* try to cause an exception pending in the log */
           cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
        }
    }
    if (cpu->exception_index >= 0) {
        ...
    }
    return false;

Paolo
Pavel Dovgalyuk Nov. 2, 2017, 11:24 a.m. UTC | #2
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 31/10/2017 12:26, Pavel Dovgalyuk wrote:
> > This patch resets icount_decr.u32.high before calling cpu_exec_nocache
> > when exception is pending. Exception is caused by the first instruction
> > in the block and it cannot be executed without resetting the flag.
> >
> > Signed-off-by: Maria Klimushenkova <maria.klimushenkova@ispras.ru>
> > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> >
> > ---
> >  accel/tcg/cpu-exec.c |    1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 35d0240..aaa9c2d 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -500,6 +500,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
> >      } else if (replay_has_exception()
> >                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> >          /* try to cause an exception pending in the log */
> > +        atomic_set(&cpu->icount_decr.u16.high, 0);
> >          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
> >          *ret = -1;
> >          return true;
> >
> 
> I am not sure about this.  I think if instead you should return false
> from here and EXCP_INTERRUPT from cpu_exec.

The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
And we have to enter the TB to cause an exception (because it exists in replay log).
That is why we reset this flag and try to execute the TB.

> More important: there is still a race, because high can be set to -1
> right after your atomic_set.

I'm not sure about it. But even the race exists, exec_nocache attempt will be repeated
after failed try.

Returning true is ok here, because we know that exception will happen (because it is
recorded in the log).

Pavel Dovgalyuk
Paolo Bonzini Nov. 2, 2017, 11:33 a.m. UTC | #3
On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
>> I am not sure about this.  I think if instead you should return false
>> from here and EXCP_INTERRUPT from cpu_exec.
> The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
> And we have to enter the TB to cause an exception (because it exists in replay log).
> That is why we reset this flag and try to execute the TB.

But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
"Finally, check if we need to exit to the main loop" in
cpu_handle_interrupt)?  Then only cause the exception when that one is
processed.

Paolo

>> More important: there is still a race, because high can be set to -1
>> right after your atomic_set.
> I'm not sure about it. But even the race exists, exec_nocache attempt will be repeated
> after failed try.
> 
> Returning true is ok here, because we know that exception will happen (because it is
> recorded in the log).
Paolo Bonzini Nov. 2, 2017, 11:46 a.m. UTC | #4
On 02/11/2017 12:33, Paolo Bonzini wrote:
> On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
>>> I am not sure about this.  I think if instead you should return false
>>> from here and EXCP_INTERRUPT from cpu_exec.
>> The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
>> And we have to enter the TB to cause an exception (because it exists in replay log).
>> That is why we reset this flag and try to execute the TB.
> 
> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
> "Finally, check if we need to exit to the main loop" in
> cpu_handle_interrupt)?  Then only cause the exception when that one is
> processed.

... indeed, you probably need something like:

    /* Clear the interrupt flag now since we're processing
     * cpu->interrupt_request and cpu->exit_request.
     */
    insns_left = atomic_read(&cpu->icount_decr.u32);
    atomic_set(&cpu->icount_decr.u16.high, 0);
    if (unlikely(insns_left < 0) {
        /* Ensure the zeroing of icount_decr comes before the next read
         * of cpu->exit_request or cpu->interrupt_request.
         */
        smb_mb();
    }

at the top of cpu_handle_interrupt.  Then you can remove the same
atomic_set+smp_mb in cpu_loop_exec_tb, like

    *last_tb = NULL;
    insns_left = atomic_read(&cpu->icount_decr.u32);
    if (insns_left < 0) {
        /* Something asked us to stop executing chained TBs; just
         * continue round the main loop. Whatever requested the exit
         * will also have set something else (eg exit_request or
         * interrupt_request) which will be handled by
         * cpu_handle_interrupt.  cpu_handle_interrupt will also
         * clear cpu->icount_decr.u16.high.
         */
        return;
    }

Thanks,

Paolo
Pavel Dovgalyuk Nov. 2, 2017, 12:45 p.m. UTC | #5
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
> >> I am not sure about this.  I think if instead you should return false
> >> from here and EXCP_INTERRUPT from cpu_exec.
> > The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
> > And we have to enter the TB to cause an exception (because it exists in replay log).
> > That is why we reset this flag and try to execute the TB.
> 
> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
> "Finally, check if we need to exit to the main loop" in
> cpu_handle_interrupt)?  Then only cause the exception when that one is
> processed.

The case is the following.
1. There are no pending instructions to execute, cpu_loop_exec_tb finished.
2. There are no interrupts and cpu_handle_interrupt sets cpu->exception_index = EXCP_INTERRUPT
3. There are no pending exceptions and cpu_handle_exception goes to the last branch,
   because there is an exception flag in the log.
4. cpu_exec_nocache translates the block and tries to execute it, causing an exception

> 
> Paolo
> 
> >> More important: there is still a race, because high can be set to -1
> >> right after your atomic_set.
> > I'm not sure about it. But even the race exists, exec_nocache attempt will be repeated
> > after failed try.
> >
> > Returning true is ok here, because we know that exception will happen (because it is
> > recorded in the log).


Pavel Dovgalyuk
Paolo Bonzini Nov. 2, 2017, 2:43 p.m. UTC | #6
On 02/11/2017 13:45, Pavel Dovgalyuk wrote:
>> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
>> "Finally, check if we need to exit to the main loop" in
>> cpu_handle_interrupt)?  Then only cause the exception when that one is
>> processed.
> The case is the following.
> 1. There are no pending instructions to execute, cpu_loop_exec_tb finished.
> 2. There are no interrupts and cpu_handle_interrupt sets cpu->exception_index = EXCP_INTERRUPT
> 3. There are no pending exceptions and cpu_handle_exception goes to the last branch,
>    because there is an exception flag in the log.
> 4. cpu_exec_nocache translates the block and tries to execute it, causing an exception
> 

Then the fix is indeed to clear u16.high in cpu_handle_interrupt instead
of cpu_loop_exec_tb---see my other reply.

Paolo
Pavel Dovgalyuk Nov. 3, 2017, 8:27 a.m. UTC | #7
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 02/11/2017 12:33, Paolo Bonzini wrote:
> > On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
> >>> I am not sure about this.  I think if instead you should return false
> >>> from here and EXCP_INTERRUPT from cpu_exec.
> >> The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
> >> And we have to enter the TB to cause an exception (because it exists in replay log).
> >> That is why we reset this flag and try to execute the TB.
> >
> > But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
> > "Finally, check if we need to exit to the main loop" in
> > cpu_handle_interrupt)?  Then only cause the exception when that one is
> > processed.
> 
> ... indeed, you probably need something like:
> 
>     /* Clear the interrupt flag now since we're processing
>      * cpu->interrupt_request and cpu->exit_request.
>      */
>     insns_left = atomic_read(&cpu->icount_decr.u32);
>     atomic_set(&cpu->icount_decr.u16.high, 0);
>     if (unlikely(insns_left < 0) {
>         /* Ensure the zeroing of icount_decr comes before the next read
>          * of cpu->exit_request or cpu->interrupt_request.
>          */
>         smb_mb();
>     }
> 
> at the top of cpu_handle_interrupt.  Then you can remove the same
> atomic_set+smp_mb in cpu_loop_exec_tb, like
> 
>     *last_tb = NULL;
>     insns_left = atomic_read(&cpu->icount_decr.u32);
>     if (insns_left < 0) {
>         /* Something asked us to stop executing chained TBs; just
>          * continue round the main loop. Whatever requested the exit
>          * will also have set something else (eg exit_request or
>          * interrupt_request) which will be handled by
>          * cpu_handle_interrupt.  cpu_handle_interrupt will also
>          * clear cpu->icount_decr.u16.high.
>          */
>         return;
>     }

I tried this approach and it didn't work.
I think iothread sets u16.high flag after resetting it in cpu_handle_interrupt.

But there is another possible approach: set new tcg flag, which disables checking
the exit_request at the entry to the TB.


Pavel Dovgalyuk
Paolo Bonzini Nov. 6, 2017, 1:48 p.m. UTC | #8
On 03/11/2017 09:27, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> On 02/11/2017 12:33, Paolo Bonzini wrote:
>>> On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
>>>>> I am not sure about this.  I think if instead you should return false
>>>>> from here and EXCP_INTERRUPT from cpu_exec.
>>>> The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
>>>> And we have to enter the TB to cause an exception (because it exists in replay log).
>>>> That is why we reset this flag and try to execute the TB.
>>>
>>> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
>>> "Finally, check if we need to exit to the main loop" in
>>> cpu_handle_interrupt)?  Then only cause the exception when that one is
>>> processed.
>>
>> ... indeed, you probably need something like:
>>
>>     /* Clear the interrupt flag now since we're processing
>>      * cpu->interrupt_request and cpu->exit_request.
>>      */
>>     insns_left = atomic_read(&cpu->icount_decr.u32);
>>     atomic_set(&cpu->icount_decr.u16.high, 0);
>>     if (unlikely(insns_left < 0) {
>>         /* Ensure the zeroing of icount_decr comes before the next read
>>          * of cpu->exit_request or cpu->interrupt_request.
>>          */
>>         smb_mb();
>>     }
>>
>> at the top of cpu_handle_interrupt.  Then you can remove the same
>> atomic_set+smp_mb in cpu_loop_exec_tb, like
>>
>>     *last_tb = NULL;
>>     insns_left = atomic_read(&cpu->icount_decr.u32);
>>     if (insns_left < 0) {
>>         /* Something asked us to stop executing chained TBs; just
>>          * continue round the main loop. Whatever requested the exit
>>          * will also have set something else (eg exit_request or
>>          * interrupt_request) which will be handled by
>>          * cpu_handle_interrupt.  cpu_handle_interrupt will also
>>          * clear cpu->icount_decr.u16.high.
>>          */
>>         return;
>>     }
> 
> I tried this approach and it didn't work.
> I think iothread sets u16.high flag after resetting it in cpu_handle_interrupt.

But why is this a problem?  The TB would exit immediately and go again
to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
cpu_handle_exception causes the exception via cpu_exec_nocache.

> But there is another possible approach: set new tcg flag, which disables checking
> the exit_request at the entry to the TB.

No, I don't want to add complication.  I want to understand what's going
on---it's always worked very well whenever you pushed new rr bits,
overall we ended up with a *simpler* CPU loop I think. :)

Paolo
Alex Bennée Nov. 6, 2017, 2:01 p.m. UTC | #9
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> On 02/11/2017 12:33, Paolo Bonzini wrote:
>> > On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
>> >>> I am not sure about this.  I think if instead you should return false
>> >>> from here and EXCP_INTERRUPT from cpu_exec.
>> >> The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
>> >> And we have to enter the TB to cause an exception (because it exists in replay log).
>> >> That is why we reset this flag and try to execute the TB.
>> >
>> > But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
>> > "Finally, check if we need to exit to the main loop" in
>> > cpu_handle_interrupt)?  Then only cause the exception when that one is
>> > processed.
>>
>> ... indeed, you probably need something like:
>>
>>     /* Clear the interrupt flag now since we're processing
>>      * cpu->interrupt_request and cpu->exit_request.
>>      */
>>     insns_left = atomic_read(&cpu->icount_decr.u32);
>>     atomic_set(&cpu->icount_decr.u16.high, 0);
>>     if (unlikely(insns_left < 0) {
>>         /* Ensure the zeroing of icount_decr comes before the next read
>>          * of cpu->exit_request or cpu->interrupt_request.
>>          */
>>         smb_mb();
>>     }
>>
>> at the top of cpu_handle_interrupt.  Then you can remove the same
>> atomic_set+smp_mb in cpu_loop_exec_tb, like
>>
>>     *last_tb = NULL;
>>     insns_left = atomic_read(&cpu->icount_decr.u32);
>>     if (insns_left < 0) {
>>         /* Something asked us to stop executing chained TBs; just
>>          * continue round the main loop. Whatever requested the exit
>>          * will also have set something else (eg exit_request or
>>          * interrupt_request) which will be handled by
>>          * cpu_handle_interrupt.  cpu_handle_interrupt will also
>>          * clear cpu->icount_decr.u16.high.
>>          */
>>         return;
>>     }
>
> I tried this approach and it didn't work.
> I think iothread sets u16.high flag after resetting it in cpu_handle_interrupt.
>
> But there is another possible approach: set new tcg flag, which disables checking
> the exit_request at the entry to the TB.

I'm instinctively wary of adding additional flags as it complicates the
number of exit states - especially as we went to the trouble of merging
it all into one (wide) flag.

Where in the TB is this exception we need to execute? It can't be more
than one instruction can it - otherwise the icount would be higher. Is
this an off-by-one issue?

Would another approach be to say:

  - icount is 0
  - we have a pending exception not yet generated

therefor

  - translate a new, non-cached, non-icount/exit checking, single
    instruction block for the exception

and then assert if an exception wasn't raised?

>
>
> Pavel Dovgalyuk


--
Alex Bennée
Pavel Dovgalyuk Nov. 10, 2017, 8:20 a.m. UTC | #10
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 03/11/2017 09:27, Pavel Dovgalyuk wrote:
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> On 02/11/2017 12:33, Paolo Bonzini wrote:
> >>> On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
> >>>>> I am not sure about this.  I think if instead you should return false
> >>>>> from here and EXCP_INTERRUPT from cpu_exec.
> >>>> The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
> >>>> And we have to enter the TB to cause an exception (because it exists in replay log).
> >>>> That is why we reset this flag and try to execute the TB.
> >>>
> >>> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
> >>> "Finally, check if we need to exit to the main loop" in
> >>> cpu_handle_interrupt)?  Then only cause the exception when that one is
> >>> processed.
> >>
> >> ... indeed, you probably need something like:
> >>
> >>     /* Clear the interrupt flag now since we're processing
> >>      * cpu->interrupt_request and cpu->exit_request.
> >>      */
> >>     insns_left = atomic_read(&cpu->icount_decr.u32);
> >>     atomic_set(&cpu->icount_decr.u16.high, 0);
> >>     if (unlikely(insns_left < 0) {
> >>         /* Ensure the zeroing of icount_decr comes before the next read
> >>          * of cpu->exit_request or cpu->interrupt_request.
> >>          */
> >>         smb_mb();
> >>     }
> >>
> >> at the top of cpu_handle_interrupt.  Then you can remove the same
> >> atomic_set+smp_mb in cpu_loop_exec_tb, like
> >>
> >>     *last_tb = NULL;
> >>     insns_left = atomic_read(&cpu->icount_decr.u32);
> >>     if (insns_left < 0) {
> >>         /* Something asked us to stop executing chained TBs; just
> >>          * continue round the main loop. Whatever requested the exit
> >>          * will also have set something else (eg exit_request or
> >>          * interrupt_request) which will be handled by
> >>          * cpu_handle_interrupt.  cpu_handle_interrupt will also
> >>          * clear cpu->icount_decr.u16.high.
> >>          */
> >>         return;
> >>     }
> >
> > I tried this approach and it didn't work.
> > I think iothread sets u16.high flag after resetting it in cpu_handle_interrupt.
> 
> But why is this a problem?  The TB would exit immediately and go again
> to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
> cpu_handle_exception causes the exception via cpu_exec_nocache.

I've tested your variant more thoroughly.
It seems, that iothread calls cpu_exec between atomic_set(&cpu->icount_decr.u16.high, 0); 
in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
I see no other reason, because this happens not for the every time.
And cpu_handle_interrupt is not called again, because cpu_handle_exception returns true.
Therefore we have an infinite loop, because no other code here resets cpu->icount_decr.u16.high.

Pavel Dovgalyuk
Paolo Bonzini Nov. 10, 2017, 8:31 a.m. UTC | #11
On 10/11/2017 09:20, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> On 03/11/2017 09:27, Pavel Dovgalyuk wrote:
>>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>>> On 02/11/2017 12:33, Paolo Bonzini wrote:
>>>>> On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
>>>>>>> I am not sure about this.  I think if instead you should return false
>>>>>>> from here and EXCP_INTERRUPT from cpu_exec.
>>>>>> The problem is inside the TB. It checks cpu->icount_decr.u16.high which is -1.
>>>>>> And we have to enter the TB to cause an exception (because it exists in replay log).
>>>>>> That is why we reset this flag and try to execute the TB.
>>>>>
>>>>> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
>>>>> "Finally, check if we need to exit to the main loop" in
>>>>> cpu_handle_interrupt)?  Then only cause the exception when that one is
>>>>> processed.
>>>>
>>>> ... indeed, you probably need something like:
>>>>
>>>>     /* Clear the interrupt flag now since we're processing
>>>>      * cpu->interrupt_request and cpu->exit_request.
>>>>      */
>>>>     insns_left = atomic_read(&cpu->icount_decr.u32);
>>>>     atomic_set(&cpu->icount_decr.u16.high, 0);
>>>>     if (unlikely(insns_left < 0) {
>>>>         /* Ensure the zeroing of icount_decr comes before the next read
>>>>          * of cpu->exit_request or cpu->interrupt_request.
>>>>          */
>>>>         smb_mb();
>>>>     }
>>>>
>>>> at the top of cpu_handle_interrupt.  Then you can remove the same
>>>> atomic_set+smp_mb in cpu_loop_exec_tb, like
>>>>
>>>>     *last_tb = NULL;
>>>>     insns_left = atomic_read(&cpu->icount_decr.u32);
>>>>     if (insns_left < 0) {
>>>>         /* Something asked us to stop executing chained TBs; just
>>>>          * continue round the main loop. Whatever requested the exit
>>>>          * will also have set something else (eg exit_request or
>>>>          * interrupt_request) which will be handled by
>>>>          * cpu_handle_interrupt.  cpu_handle_interrupt will also
>>>>          * clear cpu->icount_decr.u16.high.
>>>>          */
>>>>         return;
>>>>     }
>>>
>>> I tried this approach and it didn't work.
>>> I think iothread sets u16.high flag after resetting it in cpu_handle_interrupt.
>>
>> But why is this a problem?  The TB would exit immediately and go again
>> to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
>> cpu_handle_exception causes the exception via cpu_exec_nocache.
> 
> I've tested your variant more thoroughly.
> It seems, that iothread calls cpu_exec between atomic_set(&cpu->icount_decr.u16.high, 0); 
> in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
> I see no other reason, because this happens not for the every time.
> And cpu_handle_interrupt is not called again, because cpu_handle_exception returns true.
> Therefore we have an infinite loop, because no other code here resets cpu->icount_decr.u16.high.

Then returning true unconditionally is wrong in the cpu_exec_nocache
case.  What if you do:

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 61297f8f4a..fb5446be3e 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -470,7 +470,19 @@ static inline void cpu_handle_debug_exception(CPUState *cpu)
 
 static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 {
-    if (cpu->exception_index >= 0) {
+    if (cpu->exception_index < 0) {
+#ifndef CONFIG_USER_ONLY
+        if (replay_has_exception()
+            && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+            /* try to cause an exception pending in the log */
+            cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
+        }
+#endif
+        if (cpu->exception_index < 0) {
+            return;
+        }
+    }
+
         if (cpu->exception_index >= EXCP_INTERRUPT) {
             /* exit request from the cpu execution loop */
             *ret = cpu->exception_index;
@@ -505,16 +517,6 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
             }
 #endif
         }
-#ifndef CONFIG_USER_ONLY
-    } else if (replay_has_exception()
-               && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
-        /* try to cause an exception pending in the log */
-        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
-        *ret = -1;
-        return true;
-#endif
-    }
-
     return false;
 }
Pavel Dovgalyuk Nov. 10, 2017, 12:29 p.m. UTC | #12
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >>>
> >>> I tried this approach and it didn't work.
> >>> I think iothread sets u16.high flag after resetting it in cpu_handle_interrupt.
> >>
> >> But why is this a problem?  The TB would exit immediately and go again
> >> to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
> >> cpu_handle_exception causes the exception via cpu_exec_nocache.
> >
> > I've tested your variant more thoroughly.
> > It seems, that iothread calls cpu_exec between atomic_set(&cpu->icount_decr.u16.high, 0);
> > in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
> > I see no other reason, because this happens not for the every time.
> > And cpu_handle_interrupt is not called again, because cpu_handle_exception returns true.
> > Therefore we have an infinite loop, because no other code here resets cpu-
> >icount_decr.u16.high.
> 
> Then returning true unconditionally is wrong in the cpu_exec_nocache
> case.  What if you do:
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 61297f8f4a..fb5446be3e 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -470,7 +470,19 @@ static inline void cpu_handle_debug_exception(CPUState *cpu)
> 
>  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>  {
> -    if (cpu->exception_index >= 0) {
> +    if (cpu->exception_index < 0) {
> +#ifndef CONFIG_USER_ONLY
> +        if (replay_has_exception()
> +            && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> +            /* try to cause an exception pending in the log */
> +            cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
> +        }
> +#endif
> +        if (cpu->exception_index < 0) {
> +            return;

return false, I guess?
This approach allows iterating in case of races
and QEMU does not hangs anymore at replay.

> +        }
> +    }
> +
>          if (cpu->exception_index >= EXCP_INTERRUPT) {
>              /* exit request from the cpu execution loop */
>              *ret = cpu->exception_index;
> @@ -505,16 +517,6 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>              }
>  #endif
>          }
> -#ifndef CONFIG_USER_ONLY
> -    } else if (replay_has_exception()
> -               && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> -        /* try to cause an exception pending in the log */
> -        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
> -        *ret = -1;
> -        return true;
> -#endif
> -    }
> -
>      return false;
>  }
> 


Pavel Dovgalyuk
Paolo Bonzini Nov. 10, 2017, 1:12 p.m. UTC | #13
On 10/11/2017 13:29, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>>>>
>>>>> I tried this approach and it didn't work.
>>>>> I think iothread sets u16.high flag after resetting it in cpu_handle_interrupt.
>>>>
>>>> But why is this a problem?  The TB would exit immediately and go again
>>>> to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
>>>> cpu_handle_exception causes the exception via cpu_exec_nocache.
>>>
>>> I've tested your variant more thoroughly.
>>> It seems, that iothread calls cpu_exec between atomic_set(&cpu->icount_decr.u16.high, 0);
>>> in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
>>> I see no other reason, because this happens not for the every time.
>>> And cpu_handle_interrupt is not called again, because cpu_handle_exception returns true.
>>> Therefore we have an infinite loop, because no other code here resets cpu-
>>> icount_decr.u16.high.
>>
>> Then returning true unconditionally is wrong in the cpu_exec_nocache
>> case.  What if you do:
>>
>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index 61297f8f4a..fb5446be3e 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -470,7 +470,19 @@ static inline void cpu_handle_debug_exception(CPUState *cpu)
>>
>>  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>>  {
>> -    if (cpu->exception_index >= 0) {
>> +    if (cpu->exception_index < 0) {
>> +#ifndef CONFIG_USER_ONLY
>> +        if (replay_has_exception()
>> +            && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>> +            /* try to cause an exception pending in the log */
>> +            cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
>> +        }
>> +#endif
>> +        if (cpu->exception_index < 0) {
>> +            return;
> 
> return false, I guess?
> This approach allows iterating in case of races
> and QEMU does not hangs anymore at replay.

Great, can you put this change the next time you send your series?
There are some parts that can definitely go in for 2.11.

Thanks,

Paolo

>> +        }
>> +    }
>> +
>>          if (cpu->exception_index >= EXCP_INTERRUPT) {
>>              /* exit request from the cpu execution loop */
>>              *ret = cpu->exception_index;
>> @@ -505,16 +517,6 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>>              }
>>  #endif
>>          }
>> -#ifndef CONFIG_USER_ONLY
>> -    } else if (replay_has_exception()
>> -               && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>> -        /* try to cause an exception pending in the log */
>> -        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
>> -        *ret = -1;
>> -        return true;
>> -#endif
>> -    }
>> -
>>      return false;
>>  }
>>
> 
> 
> Pavel Dovgalyuk
>
diff mbox series

Patch

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 35d0240..aaa9c2d 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -500,6 +500,7 @@  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
         /* try to cause an exception pending in the log */
+        atomic_set(&cpu->icount_decr.u16.high, 0);
         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
         *ret = -1;
         return true;