diff mbox

ui/vnc-jobs: Delete unused and buggy vnc_stop_worker_thread()

Message ID 1350570527-24187-1-git-send-email-peter.maydell@linaro.org
State New
Headers show

Commit Message

Peter Maydell Oct. 18, 2012, 2:28 p.m. UTC
The function vnc_stop_worker_thread() is buggy, beacuse it tries to
delete jobs from the worker thread's queue but the worker thread itself
will not cope with this happening (it would end up trying to remove
an already-removed list item from its queue list). Fortunately
nobody ever calls vnc_stop_worker_thread(), so we can fix this by
simply deleting all the untested racy code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
Seems the easiest way to deal with this bug spotted via code
inspection :-)

 ui/vnc-jobs.c |   26 --------------------------
 ui/vnc-jobs.h |    2 --
 2 files changed, 28 deletions(-)

Comments

Paolo Bonzini Oct. 18, 2012, 3:01 p.m. UTC | #1
Il 18/10/2012 16:28, Peter Maydell ha scritto:
> The function vnc_stop_worker_thread() is buggy, beacuse it tries to
> delete jobs from the worker thread's queue but the worker thread itself
> will not cope with this happening (it would end up trying to remove
> an already-removed list item from its queue list). Fortunately
> nobody ever calls vnc_stop_worker_thread(), so we can fix this by
> simply deleting all the untested racy code.

Note that there is just one queue.  The queue global == the arg argument
of vnc_worker_thread == the queue argument of vnc_worker_thread_loop.
So I'm not sure I follow your reasoning.

So the bug may be that we never call vnc_stop_worker_thread from
vnc_disconnect_finish.  BTW vnc_jobs_join is called there so we could
just assert that the queue is empty...

Paolo

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> Seems the easiest way to deal with this bug spotted via code
> inspection :-)
Peter Maydell Oct. 18, 2012, 3:12 p.m. UTC | #2
On 18 October 2012 16:01, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 18/10/2012 16:28, Peter Maydell ha scritto:
>> The function vnc_stop_worker_thread() is buggy, beacuse it tries to
>> delete jobs from the worker thread's queue but the worker thread itself
>> will not cope with this happening (it would end up trying to remove
>> an already-removed list item from its queue list). Fortunately
>> nobody ever calls vnc_stop_worker_thread(), so we can fix this by
>> simply deleting all the untested racy code.
>
> Note that there is just one queue.  The queue global == the arg argument
> of vnc_worker_thread == the queue argument of vnc_worker_thread_loop.
> So I'm not sure I follow your reasoning.

vnc_worker_thread_loop does this:
 lock queue
 get pointer to first job in queue
 unlock queue
 do stuff with job
 lock queue
 QTAILQ_REMOVE(&queue->jobs, job, next)
 unlock queue
 g_free(job)

vnc_jobs_clear does this:
 lock queue
 QTAILQ_REMOVE each job from the queue
 unlock queue

So two problems:
(1) any job removed by vnc_jobs_clear is never g_free()d
(2) if vnc_jobs_clear removes job X from the queue while the worker loop
    is in its "do stuff with job" phase, then the worker loop will
    subsequently try to QTAILQ_REMOVE an object from a list it is not
    in, which will probably crash or otherwise misbehave

Here's a third which I just noticed:
(3) vnc_stop_worker thread sets queue->exit to true, and then does
 a number of things with 'queue'. However, as soon as you've set
 queue->exit you can't touch 'queue' again, because the worker
 thread might (if you were unlucky) be just about to do the
 'if (queue->exit) { return -1; }', which will cause it to destroy
 the condvar and muteux and free the queue memory. In particular,
 even doing the 'vnc_unlock_queue()' in vnc_stop_worker_thread()
 is unsafe, because the worker thread does its exit-check without
 the lock held, so it could destroy the mutex before you unlocked it.

> So the bug may be that we never call vnc_stop_worker_thread from
> vnc_disconnect_finish.  BTW vnc_jobs_join is called there so we could
> just assert that the queue is empty...

Yes, actually trying to find somewhere to make a clean shutdown
of the worker thread and fixing all the bugs in the shutdown
code would be the other approach. That seems like hard work to me :-)

-- PMM
Paolo Bonzini Oct. 18, 2012, 3:36 p.m. UTC | #3
Il 18/10/2012 17:12, Peter Maydell ha scritto:
> On 18 October 2012 16:01, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 18/10/2012 16:28, Peter Maydell ha scritto:
>>> The function vnc_stop_worker_thread() is buggy, beacuse it tries to
>>> delete jobs from the worker thread's queue but the worker thread itself
>>> will not cope with this happening (it would end up trying to remove
>>> an already-removed list item from its queue list). Fortunately
>>> nobody ever calls vnc_stop_worker_thread(), so we can fix this by
>>> simply deleting all the untested racy code.
>>
>> Note that there is just one queue.  The queue global == the arg argument
>> of vnc_worker_thread == the queue argument of vnc_worker_thread_loop.
>> So I'm not sure I follow your reasoning.
> 
> vnc_worker_thread_loop does this:
>  lock queue
>  get pointer to first job in queue
>  unlock queue
>  do stuff with job
>  lock queue
>  QTAILQ_REMOVE(&queue->jobs, job, next)
>  unlock queue

Uh, right... the QTAILQ_REMOVE looks completely misplaced, otoh
vnc_has_job relies on that.

>  g_free(job)
> 
> vnc_jobs_clear does this:
>  lock queue
>  QTAILQ_REMOVE each job from the queue
>  unlock queue
> 
> So two problems:
> (1) any job removed by vnc_jobs_clear is never g_free()d
> (2) if vnc_jobs_clear removes job X from the queue while the worker loop
>     is in its "do stuff with job" phase, then the worker loop will
>     subsequently try to QTAILQ_REMOVE an object from a list it is not
>     in, which will probably crash or otherwise misbehave

... but vnc_jobs_clear should be dead code because vnc_jobs_join is
called first.  So the "right fix" would include anyway the half of your
patch that deletes vnc_jobs_clear, and would revert the other half...

> Here's a third which I just noticed:
> (3) vnc_stop_worker thread sets queue->exit to true, and then does
>  a number of things with 'queue'. However, as soon as you've set
>  queue->exit you can't touch 'queue' again, because the worker
>  thread might (if you were unlucky) be just about to do the
>  'if (queue->exit) { return -1; }', which will cause it to destroy
>  the condvar and muteux and free the queue memory. In particular,
>  even doing the 'vnc_unlock_queue()' in vnc_stop_worker_thread()
>  is unsafe, because the worker thread does its exit-check without
>  the lock held, so it could destroy the mutex before you unlocked it.

Right, the solution for this is to move the destruction to the caller,
because now we have joinable QemuThreads and those are a better way to
synchronize.

Paolo

>> So the bug may be that we never call vnc_stop_worker_thread from
>> vnc_disconnect_finish.  BTW vnc_jobs_join is called there so we could
>> just assert that the queue is empty...
> 
> Yes, actually trying to find somewhere to make a clean shutdown
> of the worker thread and fixing all the bugs in the shutdown
> code would be the other approach. That seems like hard work to me :-)
> 
> -- PMM
> 
>
Peter Maydell Oct. 18, 2012, 3:45 p.m. UTC | #4
Another bug:

(4) if vnc_job_push() discovers that queue->exit is true it
will free the job but leak its rectangle list
(5) the early-exit ("goto disconnected") code paths in
vnc_worker_thread_loop() also leak the rectangle list

And a couple of inefficiencies/oddities which aren't actually bugs:

(6) An inefficiency (unnecessary lock/unlocks):
the code locks and unlocks the queue in vnc_job_new() and
vnc_job_add_rect() when it is manipulating the job->rectangles list.
In the former case this is definitely pointless -- we've just alloc'd
the VncJob struct so it's impossible for anybody else to be trying to
use the list yet. In the latter it's not necessary since the semantics
are that we create a job with vnc_job_new, fill it up with rectangles
and then hand it off to the queue with vnc_job_push(). So we know
only one thread will be touching the job before it goes in the queue
and grabbing the queue lock is unnecessary overhead.

(7) vnc_has_job() is unused, which is just as well because it's
pretty hard to use non-racily, since the true/false answer it
returns could be made wrong as soon as it drops the queue lock.
vnc_has_job_locked() is OK (as is its usage pattern in _join).

-- PMM
diff mbox

Patch

diff --git a/ui/vnc-jobs.c b/ui/vnc-jobs.c
index 087b84d..c5ce2f1 100644
--- a/ui/vnc-jobs.c
+++ b/ui/vnc-jobs.c
@@ -136,19 +136,6 @@  bool vnc_has_job(VncState *vs)
     return ret;
 }
 
-void vnc_jobs_clear(VncState *vs)
-{
-    VncJob *job, *tmp;
-
-    vnc_lock_queue(queue);
-    QTAILQ_FOREACH_SAFE(job, &queue->jobs, next, tmp) {
-        if (job->vs == vs || !vs) {
-            QTAILQ_REMOVE(&queue->jobs, job, next);
-        }
-    }
-    vnc_unlock_queue(queue);
-}
-
 void vnc_jobs_join(VncState *vs)
 {
     vnc_lock_queue(queue);
@@ -336,16 +323,3 @@  bool vnc_worker_thread_running(void)
 {
     return queue; /* Check global queue */
 }
-
-void vnc_stop_worker_thread(void)
-{
-    if (!vnc_worker_thread_running())
-        return ;
-
-    /* Remove all jobs and wake up the thread */
-    vnc_lock_queue(queue);
-    queue->exit = true;
-    vnc_unlock_queue(queue);
-    vnc_jobs_clear(NULL);
-    qemu_cond_broadcast(&queue->cond);
-}
diff --git a/ui/vnc-jobs.h b/ui/vnc-jobs.h
index 86e6d88..b87857f 100644
--- a/ui/vnc-jobs.h
+++ b/ui/vnc-jobs.h
@@ -35,13 +35,11 @@  VncJob *vnc_job_new(VncState *vs);
 int vnc_job_add_rect(VncJob *job, int x, int y, int w, int h);
 void vnc_job_push(VncJob *job);
 bool vnc_has_job(VncState *vs);
-void vnc_jobs_clear(VncState *vs);
 void vnc_jobs_join(VncState *vs);
 
 void vnc_jobs_consume_buffer(VncState *vs);
 void vnc_start_worker_thread(void);
 bool vnc_worker_thread_running(void);
-void vnc_stop_worker_thread(void);
 
 /* Locks */
 static inline int vnc_trylock_display(VncDisplay *vd)