Message ID | 1435010055-4584-1-git-send-email-zavadovsky.yan@gmail.com |
---|---|
State | New |
Headers | show |
Am 22.06.2015 um 23:54 schrieb Zavadovsky Yan: > Calling SuspendThread() is not enough to suspend Win32 thread. > We need to call GetThreadContext() after SuspendThread() > to make sure that OS have really suspended target thread. > But GetThreadContext() needs for THREAD_GET_CONTEXT > access right on thread object. > > This patch adds THREAD_GET_CONTEXT to OpenThread() arguments > and change 'while(GetThreadContext() == SUCCESS)' to > 'while(GetThreadContext() == FAILED)'. > So this 'while' loop will stop only after successful grabbing > of thread context(i.e. when thread is really suspended). > Not after the one failed GetThreadContext() call. > > Signed-off-by: Zavadovsky Yan <zavadovsky.yan@gmail.com> > --- > cpus.c | 2 +- > util/qemu-thread-win32.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/cpus.c b/cpus.c > index b85fb5f..83d5eb5 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -1097,7 +1097,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu) > * suspended until we can get the context. > */ > tcgContext.ContextFlags = CONTEXT_CONTROL; > - while (GetThreadContext(cpu->hThread, &tcgContext) != 0) { > + while (GetThreadContext(cpu->hThread, &tcgContext) == 0) { > continue; > } > > diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c > index 406b52f..823eca1 100644 > --- a/util/qemu-thread-win32.c > +++ b/util/qemu-thread-win32.c > @@ -406,8 +406,8 @@ HANDLE qemu_thread_get_handle(QemuThread *thread) > > EnterCriticalSection(&data->cs); > if (!data->exited) { > - handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME, FALSE, > - thread->tid); > + handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME | THREAD_GET_CONTEXT, > + FALSE, thread->tid); > } else { > handle = NULL; > } I added the contributers of the original code to the cc list. The modifications look reasonable - if GetThreadContext is needed at all. We should add an URL to reliable documentation which supports that claim. Is it a good idea to run a busy waiting loop? Or would a Sleep(0) in the loop be better (it allows other threads to run, maybe it helps them to suspend, too). Regards Stefan
On 06/23/2015 08:02 AM, Stefan Weil wrote: > Am 22.06.2015 um 23:54 schrieb Zavadovsky Yan: >> Calling SuspendThread() is not enough to suspend Win32 thread. >> We need to call GetThreadContext() after SuspendThread() >> to make sure that OS have really suspended target thread. >> But GetThreadContext() needs for THREAD_GET_CONTEXT >> access right on thread object. >> >> This patch adds THREAD_GET_CONTEXT to OpenThread() arguments >> and change 'while(GetThreadContext() == SUCCESS)' to >> 'while(GetThreadContext() == FAILED)'. >> So this 'while' loop will stop only after successful grabbing >> of thread context(i.e. when thread is really suspended). >> Not after the one failed GetThreadContext() call. >> >> Signed-off-by: Zavadovsky Yan <zavadovsky.yan@gmail.com> >> --- >> cpus.c | 2 +- >> util/qemu-thread-win32.c | 4 ++-- >> 2 files changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/cpus.c b/cpus.c >> index b85fb5f..83d5eb5 100644 >> --- a/cpus.c >> +++ b/cpus.c >> @@ -1097,7 +1097,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu) >> * suspended until we can get the context. >> */ >> tcgContext.ContextFlags = CONTEXT_CONTROL; >> - while (GetThreadContext(cpu->hThread, &tcgContext) != 0) { >> + while (GetThreadContext(cpu->hThread, &tcgContext) == 0) { >> continue; This looks like a reasonable change, right now I don't understand why I did it the other way... >> } >> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c >> index 406b52f..823eca1 100644 >> --- a/util/qemu-thread-win32.c >> +++ b/util/qemu-thread-win32.c >> @@ -406,8 +406,8 @@ HANDLE qemu_thread_get_handle(QemuThread *thread) >> EnterCriticalSection(&data->cs); >> if (!data->exited) { >> - handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME, FALSE, >> - thread->tid); >> + handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME | THREAD_GET_CONTEXT, >> + FALSE, thread->tid); >> } else { >> handle = NULL; >> } > > > I added the contributers of the original code to the cc list. > > The modifications look reasonable - if GetThreadContext is needed at all. > We should add an URL to reliable documentation which supports that > claim. > The reason we need this call is on multi-processor host, when the TCG thread and the IO-loop thread don't run on the same CPU. So in this situation the function SuspendThread can return even before the thread (running on another CPU) is effectively suspended. Unfortunately this is not really documented by Microsoft an we found that information somewhere on Internet (if you want I can search the source again but there's nothing official) after countless hours of debugging a very nasty race condition caused by this undocumented behavior. Maybe this is not explicit enough and the comments need to be updated. > Is it a good idea to run a busy waiting loop? Or would a Sleep(0) in > the loop be better (it allows other threads to run, maybe it helps > them to suspend, too). > Maybe we can, but the "while" will only loop when threads are running on different CPU, so the other thread is already running and calling sleep will not help I think. I hope this is clear, as I said we spent a huge amount of time debugging this about a year and a half ago. The bug would append once every several thousands tests. QEMU thread code is very "sensitive" on Windows so we should be careful. Yan, if you didn't already, I recommend you extensively test this modification. By extensively, I mean running QEMU several thousands of time on an SMP host (with many CPUs like 8 or 16 if possible). Regards,
Hello. On Tue, Jun 23, 2015 at 9:02 AM, Stefan Weil <sw@weilnetz.de> wrote: > Am 22.06.2015 um 23:54 schrieb Zavadovsky Yan: > >> Calling SuspendThread() is not enough to suspend Win32 thread. >> We need to call GetThreadContext() after SuspendThread() >> to make sure that OS have really suspended target thread. >> But GetThreadContext() needs for THREAD_GET_CONTEXT >> access right on thread object. >> >> This patch adds THREAD_GET_CONTEXT to OpenThread() arguments >> and change 'while(GetThreadContext() == SUCCESS)' to >> 'while(GetThreadContext() == FAILED)'. >> So this 'while' loop will stop only after successful grabbing >> of thread context(i.e. when thread is really suspended). >> Not after the one failed GetThreadContext() call. >> >> Signed-off-by: Zavadovsky Yan <zavadovsky.yan@gmail.com> >> --- >> cpus.c | 2 +- >> util/qemu-thread-win32.c | 4 ++-- >> 2 files changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/cpus.c b/cpus.c >> index b85fb5f..83d5eb5 100644 >> --- a/cpus.c >> +++ b/cpus.c >> @@ -1097,7 +1097,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu) >> * suspended until we can get the context. >> */ >> tcgContext.ContextFlags = CONTEXT_CONTROL; >> - while (GetThreadContext(cpu->hThread, &tcgContext) != 0) { >> + while (GetThreadContext(cpu->hThread, &tcgContext) == 0) { >> continue; >> } >> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c >> index 406b52f..823eca1 100644 >> --- a/util/qemu-thread-win32.c >> +++ b/util/qemu-thread-win32.c >> @@ -406,8 +406,8 @@ HANDLE qemu_thread_get_handle(QemuThread *thread) >> EnterCriticalSection(&data->cs); >> if (!data->exited) { >> - handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME, FALSE, >> - thread->tid); >> + handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME | >> THREAD_GET_CONTEXT, >> + FALSE, thread->tid); >> } else { >> handle = NULL; >> } >> > > > I added the contributers of the original code to the cc list. > > The modifications look reasonable - if GetThreadContext is needed at all. > GetThreadContext is needed to avoid races on multicore system. As it wrote in comment from original contributors of this code. We should add an URL to reliable documentation which supports that > claim. > Unfortunately, MSDN says only "SuspendThread suspends the thread. It's designed for debuggers. Don't use in applications.": https://msdn.microsoft.com/en-us/library/windows/desktop/ms686345(v=vs.85).aspx And nothing more useful. So when I found this piece of code with Suspend/Resume and failed GetContext I did some googling. And found this article: http://blogs.msdn.com/b/oldnewthing/archive/2015/02/05/10591215.aspx In which: >The SuspendThread function tells the scheduler to suspend the thread but does not wait for an acknowledgment from the scheduler that the suspension has actually occurred >If you want to make sure the thread really is suspended, you need to perform a synchronous operation that is dependent on the fact that the thread is suspended >The traditional way of doing this is to call GetThreadContext, since this requires the kernel to read from the context of the suspended thread, which has as a prerequisite that the context be saved in the first place, which has as a prerequisite that the thread be suspended > Is it a good idea to run a busy waiting loop? I think no. Because infinite loop is possible. If someone make regress in future. Maybe better use same handling as Suspend/Resume uses? 'if (GetThreadContext() == FAILED) { print_error; error_exit; }' GetThreadContext either works or not. So if it fails first call it will fail all next calls. > Or would a Sleep(0) No. Sleep is "hardcode" without any guarantees.
On 23 June 2015 at 10:55, Ян Завадовский <zavadovsky.yan@gmail.com> wrote: > On Tue, Jun 23, 2015 at 9:02 AM, Stefan Weil <sw@weilnetz.de> wrote: >> We should add an URL to reliable documentation which supports that >> claim. > > Unfortunately, MSDN says only "SuspendThread suspends the thread. It's > designed for debuggers. Don't use in applications.": > https://msdn.microsoft.com/en-us/library/windows/desktop/ms686345(v=vs.85).aspx > And nothing more useful. > So when I found this piece of code with Suspend/Resume and failed GetContext > I did some googling. > And found this article: > http://blogs.msdn.com/b/oldnewthing/archive/2015/02/05/10591215.aspx Personally I am happy to treat a Raymond Chen blog post as "reliable documentation"... -- PMM
On 23/06/2015 12:30, Peter Maydell wrote: > On 23 June 2015 at 10:55, Ян Завадовский <zavadovsky.yan@gmail.com> wrote: >> On Tue, Jun 23, 2015 at 9:02 AM, Stefan Weil <sw@weilnetz.de> wrote: >>> We should add an URL to reliable documentation which supports that >>> claim. >> >> Unfortunately, MSDN says only "SuspendThread suspends the thread. It's >> designed for debuggers. Don't use in applications.": >> https://msdn.microsoft.com/en-us/library/windows/desktop/ms686345(v=vs.85).aspx >> And nothing more useful. >> So when I found this piece of code with Suspend/Resume and failed GetContext >> I did some googling. >> And found this article: >> http://blogs.msdn.com/b/oldnewthing/archive/2015/02/05/10591215.aspx > > Personally I am happy to treat a Raymond Chen blog post as "reliable > documentation"... Me too. :) SuspendThread was pretty much the only way to emulate signals. Initially I used SetThreadContext to redirect execution to cpu_signal; that was more complicated, but in retrospect it would have avoided the problems with memory barriers and with asynchronous SuspendThread. It certainly would have saved the AdaCore people a lot of debugging time. :( For 2.5, however, I wonder if SuspendThread/ResumeThread is needed at all now that cpu_exit doesn't have to undo block chaining anymore. Even on POSIX platforms the signal might not be necessary anymore. Paolo
On Tue, Jun 23, 2015 at 12:46:59PM +0200, Paolo Bonzini wrote: > > > On 23/06/2015 12:30, Peter Maydell wrote: > > On 23 June 2015 at 10:55, Ян Завадовский <zavadovsky.yan@gmail.com> wrote: > >> On Tue, Jun 23, 2015 at 9:02 AM, Stefan Weil <sw@weilnetz.de> wrote: > >>> We should add an URL to reliable documentation which supports that > >>> claim. > >> > >> Unfortunately, MSDN says only "SuspendThread suspends the thread. It's > >> designed for debuggers. Don't use in applications.": > >> https://msdn.microsoft.com/en-us/library/windows/desktop/ms686345(v=vs.85).aspx > >> And nothing more useful. > >> So when I found this piece of code with Suspend/Resume and failed GetContext > >> I did some googling. > >> And found this article: > >> http://blogs.msdn.com/b/oldnewthing/archive/2015/02/05/10591215.aspx > > > > Personally I am happy to treat a Raymond Chen blog post as "reliable > > documentation"... > > Me too. :) > > SuspendThread was pretty much the only way to emulate signals. > Initially I used SetThreadContext to redirect execution to cpu_signal; > that was more complicated, but in retrospect it would have avoided the > problems with memory barriers and with asynchronous SuspendThread. It > certainly would have saved the AdaCore people a lot of debugging time. :( > > For 2.5, however, I wonder if SuspendThread/ResumeThread is needed at > all now that cpu_exit doesn't have to undo block chaining anymore. Even > on POSIX platforms the signal might not be necessary anymore. If you don't have that signal / SuspendThread/ResumtThread requirement, might that enable QEMU to just depend on the winpthreads library that is provided by Mingw project, and not bother reinventing the wheel for thread library portabilty ? Regards, Daniel
On 23 June 2015 at 11:46, Paolo Bonzini <pbonzini@redhat.com> wrote: > SuspendThread was pretty much the only way to emulate signals. > Initially I used SetThreadContext to redirect execution to cpu_signal; > that was more complicated, but in retrospect it would have avoided the > problems with memory barriers and with asynchronous SuspendThread. It > certainly would have saved the AdaCore people a lot of debugging time. :( > > For 2.5, however, I wonder if SuspendThread/ResumeThread is needed at > all now that cpu_exit doesn't have to undo block chaining anymore. Even > on POSIX platforms the signal might not be necessary anymore. Yeah, I was wondering that too. All we're really doing is setting three flag variables, so the suspend/resume or signal is just getting us atomicity of those flag changes. It might be possible to redesign things a bit to not require the atomicity part. -- PMM
On 23/06/2015 13:18, Daniel P. Berrange wrote: > > For 2.5, however, I wonder if SuspendThread/ResumeThread is needed at > > all now that cpu_exit doesn't have to undo block chaining anymore. Even > > on POSIX platforms the signal might not be necessary anymore. > > If you don't have that signal / SuspendThread/ResumtThread requirement, That was independent of QEMU reinventing the wheel for mutexes/condvars. > might that enable QEMU to just depend on the winpthreads library that > is provided by Mingw project, and not bother reinventing the wheel for > thread library portabilty ? We can and should just reuse glib these days as much as we can (probably not entirely because glib doesn't have detached threads). At least a few years ago, winpthreads was much slower than native Win32, which is why everyone reinvents the wheel. I was planning to do it in 2.5. Paolo
On Tue, Jun 23, 2015 at 01:32:19PM +0200, Paolo Bonzini wrote: > > > On 23/06/2015 13:18, Daniel P. Berrange wrote: > > > For 2.5, however, I wonder if SuspendThread/ResumeThread is needed at > > > all now that cpu_exit doesn't have to undo block chaining anymore. Even > > > on POSIX platforms the signal might not be necessary anymore. > > > > If you don't have that signal / SuspendThread/ResumtThread requirement, > > That was independent of QEMU reinventing the wheel for mutexes/condvars. > > > might that enable QEMU to just depend on the winpthreads library that > > is provided by Mingw project, and not bother reinventing the wheel for > > thread library portabilty ? > > We can and should just reuse glib these days as much as we can (probably > not entirely because glib doesn't have detached threads). At least a > few years ago, winpthreads was much slower than native Win32, which is > why everyone reinvents the wheel. Are you sure that was wrt the (new) winpthreads library maintained by Mingw64 team, and not the confusingly similar pthreads-win32 library ? Regards, Daniel
On 23/06/2015 13:43, Daniel P. Berrange wrote: > > We can and should just reuse glib these days as much as we can (probably > > not entirely because glib doesn't have detached threads). At least a > > few years ago, winpthreads was much slower than native Win32, which is > > why everyone reinvents the wheel. > > Are you sure that was wrt the (new) winpthreads library maintained by > Mingw64 team, and not the confusingly similar pthreads-win32 library ? None of them use native Win32 condition variables, which is the main reason to switch to glib's version. Paolo
Am 23.06.2015 um 12:46 schrieb Paolo Bonzini: > On 23/06/2015 12:30, Peter Maydell wrote: >> On 23 June 2015 at 10:55, Ян Завадовский <zavadovsky.yan@gmail.com> wrote: >>> On Tue, Jun 23, 2015 at 9:02 AM, Stefan Weil <sw@weilnetz.de> wrote: >>>> We should add an URL to reliable documentation which supports that >>>> claim. >>> Unfortunately, MSDN says only "SuspendThread suspends the thread. It's >>> designed for debuggers. Don't use in applications.": >>> https://msdn.microsoft.com/en-us/library/windows/desktop/ms686345(v=vs.85).aspx >>> And nothing more useful. >>> So when I found this piece of code with Suspend/Resume and failed GetContext >>> I did some googling. >>> And found this article: >>> http://blogs.msdn.com/b/oldnewthing/archive/2015/02/05/10591215.aspx >> Personally I am happy to treat a Raymond Chen blog post as "reliable >> documentation"... > Me too. :) +1 Fabien, I wonder why nobody noticed that the current code did not do what it was written for. As far as I see the threads were created with the wrong options, so GetThreadContext always failed and therefore was only executed once, so there was no waiting for thread suspension. Removing the code would have given identical results. Is that in an indicator that the SuspendThread is not needed at all, as it was discussed in the other e-mails here? Stefan
On 06/23/2015 07:07 PM, Stefan Weil wrote: > Am 23.06.2015 um 12:46 schrieb Paolo Bonzini: >> On 23/06/2015 12:30, Peter Maydell wrote: >>> On 23 June 2015 at 10:55, Ян Завадовский <zavadovsky.yan@gmail.com> wrote: >>>> On Tue, Jun 23, 2015 at 9:02 AM, Stefan Weil <sw@weilnetz.de> wrote: >>>>> We should add an URL to reliable documentation which supports that >>>>> claim. >>>> Unfortunately, MSDN says only "SuspendThread suspends the thread. It's >>>> designed for debuggers. Don't use in applications.": >>>> https://msdn.microsoft.com/en-us/library/windows/desktop/ms686345(v=vs.85).aspx >>>> And nothing more useful. >>>> So when I found this piece of code with Suspend/Resume and failed GetContext >>>> I did some googling. >>>> And found this article: >>>> http://blogs.msdn.com/b/oldnewthing/archive/2015/02/05/10591215.aspx >>> Personally I am happy to treat a Raymond Chen blog post as "reliable >>> documentation"... >> Me too. :) > > +1 > > Fabien, I wonder why nobody noticed that the current > code did not do what it was written for. As far as I see > the threads were created with the wrong options, so > GetThreadContext always failed and therefore was only > executed once, so there was no waiting for thread > suspension. > I'm surprised as well, but we run several hundred thousands of tests every day (one QEMU instance for each test) and before this fix we had a few instances freezing for no reason. We identified a possible race condition on SMP host and the bug disappeared after this fix. Even if the call was erroneous, adding a call to GetThreadContext probably gave more time or forced the suspend request to be effective, it's the only explanation I have right now. But clearly there was a bug, and the call to GetThreadContext fixed it. I found other pieces of code that uses this technique but calling GetThreadContext only once (not in a loop like we did), so maybe it's enough to call it once and the loop is superfluous... > Removing the code would have given identical results. > Considering we are talking about thread synchronization on Windows and SMP host, I would not make that assumption :) > Is that in an indicator that the SuspendThread is not > needed at all, as it was discussed in the other e-mails > here? If we completely change the thread synchronization on Windows, maybe SuspendeThread is not needed anymore, but with the current scheme (at least what I know of it), I don't see how we can remove it. As I said before we must be very careful with this piece of code. Regards,
On 24 June 2015 at 10:09, Fabien Chouteau <chouteau@adacore.com> wrote: > If we completely change the thread synchronization on Windows, maybe > SuspendeThread is not needed anymore, but with the current scheme (at > least what I know of it), I don't see how we can remove it. > > As I said before we must be very careful with this piece of code. Agreed; maybe for 2.5 we can consider a design fix that would let us drop this suspend/resume stuff. For 2.4 we should just make this correction to the current design, I think. -- PMM
diff --git a/cpus.c b/cpus.c index b85fb5f..83d5eb5 100644 --- a/cpus.c +++ b/cpus.c @@ -1097,7 +1097,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu) * suspended until we can get the context. */ tcgContext.ContextFlags = CONTEXT_CONTROL; - while (GetThreadContext(cpu->hThread, &tcgContext) != 0) { + while (GetThreadContext(cpu->hThread, &tcgContext) == 0) { continue; } diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c index 406b52f..823eca1 100644 --- a/util/qemu-thread-win32.c +++ b/util/qemu-thread-win32.c @@ -406,8 +406,8 @@ HANDLE qemu_thread_get_handle(QemuThread *thread) EnterCriticalSection(&data->cs); if (!data->exited) { - handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME, FALSE, - thread->tid); + handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME | THREAD_GET_CONTEXT, + FALSE, thread->tid); } else { handle = NULL; }
Calling SuspendThread() is not enough to suspend Win32 thread. We need to call GetThreadContext() after SuspendThread() to make sure that OS have really suspended target thread. But GetThreadContext() needs for THREAD_GET_CONTEXT access right on thread object. This patch adds THREAD_GET_CONTEXT to OpenThread() arguments and change 'while(GetThreadContext() == SUCCESS)' to 'while(GetThreadContext() == FAILED)'. So this 'while' loop will stop only after successful grabbing of thread context(i.e. when thread is really suspended). Not after the one failed GetThreadContext() call. Signed-off-by: Zavadovsky Yan <zavadovsky.yan@gmail.com> --- cpus.c | 2 +- util/qemu-thread-win32.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)