Patchwork linux-user: fix segfault deadlock

login
register
mail settings
Submitter Alexander Graf
Date Jan. 13, 2012, 3:46 p.m.
Message ID <1326469565-7736-1-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/135916/
State New
Headers show

Comments

Alexander Graf - Jan. 13, 2012, 3:46 p.m.
When entering the guest we take a lock to ensure that nobody else messes
with our TB chaining while we're doing it. If we get a segfault inside that
code, we manage to work on, but will not unlock the lock.

This patch forces unlocking of that lock in the segv handler. I'm not sure
this is the right approach though. Maybe we should rather make sure we don't
segfault in the code? I would greatly appreciate someone more intelligible
than me to look at this :).

Example code to trigger this is at: http://csgraf.de/tmp/conftest.c

Reported-by: Fabio Erculiani <lxnay@sabayon.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
 qemu-lock.h |   10 ++++++++++
 user-exec.c |    4 ++++
 2 files changed, 14 insertions(+), 0 deletions(-)
Peter Maydell - Jan. 13, 2012, 4:16 p.m.
On 13 January 2012 15:46, Alexander Graf <agraf@suse.de> wrote:
> When entering the guest we take a lock to ensure that nobody else messes
> with our TB chaining while we're doing it. If we get a segfault inside that
> code, we manage to work on, but will not unlock the lock.
>
> This patch forces unlocking of that lock in the segv handler. I'm not sure
> this is the right approach though. Maybe we should rather make sure we don't
> segfault in the code? I would greatly appreciate someone more intelligible
> than me to look at this :).

A segfault while we're walking the TB chains in QEMU C code?
That's just a bug (and we know we have one there) -- we should
fix it rather than papering over it like this.

-- PMM
Alexander Graf - Jan. 13, 2012, 4:21 p.m.
On 13.01.2012, at 17:16, Peter Maydell wrote:

> On 13 January 2012 15:46, Alexander Graf <agraf@suse.de> wrote:
>> When entering the guest we take a lock to ensure that nobody else messes
>> with our TB chaining while we're doing it. If we get a segfault inside that
>> code, we manage to work on, but will not unlock the lock.
>> 
>> This patch forces unlocking of that lock in the segv handler. I'm not sure
>> this is the right approach though. Maybe we should rather make sure we don't
>> segfault in the code? I would greatly appreciate someone more intelligible
>> than me to look at this :).
> 
> A segfault while we're walking the TB chains in QEMU C code?
> That's just a bug (and we know we have one there) -- we should
> fix it rather than papering over it like this.

Well, we're segfaulting in this exact special case which calls setrlimit() before an mmap which fails:


agraf@wichary:/abuild/agraf/buildroot.gmime> sudo chroot .
wichary:/> QEMU_LOG=in_asm,cpu,int,out_asm QEMU_STRACE=1 ./conftest 
32595 brk(NULL) = 0x00012000
32595 uname(0x18800808) = 0
32595 mmap2(NULL,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x18822000
32595 access("/etc/ld.so.preload",R_OK) = -1 errno=2 (No such file or directory)
32595 open("/etc/ld.so.cache",O_RDONLY) = 4
32595 fstat64(4,0x18800430) = 0
32595 mmap2(NULL,14915,PROT_READ,MAP_PRIVATE,4,0) = 0x18823000
32595 close(4) = 0
32595 open("/lib/libc.so.6",O_RDONLY) = 4
32595 read(4,0x18800534,512) = 512
32595 fstat64(4,0x18800468) = 0
32595 mmap2(NULL,947552,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,4,0) = 0x18827000
32595 mprotect(0x18902000,28672,PROT_NONE) = 0
32595 mmap2(0x18909000,12288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,4,0xda) = 0x18909000
32595 mmap2(0x1890c000,9568,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0) = 0x1890c000
32595 close(4) = 0
32595 open("/lib/libgcc_s.so.1",O_RDONLY) = 4
32595 read(4,0x1880051c,512) = 512
32595 fstat64(4,0x18800450) = 0
32595 mmap2(NULL,69908,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,4,0) = 0x1890f000
32595 mprotect(0x18918000,28672,PROT_NONE) = 0
32595 mmap2(0x1891f000,8192,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,4,0x8) = 0x1891f000
32595 close(4) = 0
32595 mmap2(NULL,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x18921000
32595 mprotect(0x18909000,8192,PROT_READ) = 0
32595 mprotect(0x1891f000,4096,PROT_READ) = 0
32595 mprotect(0x00010000,4096,PROT_READ) = 0
32595 mprotect(0x18820000,4096,PROT_READ) = 0
32595 munmap(0x18823000,14915) = 0
32595 rt_sigaction(SIGSEGV,0x18800a00,0x18800a8c) = 0
32595 rt_sigaction(70x18800a00,0x18800a8c) = 0
32595 Unknown syscall 369
32595 ugetrlimit(2,411044936,412226752,38,411044960,2) = 0
32595 Unknown syscall 369
32595 setrlimit(2,411044936,5000000,0,411044960,2) = 0
32595 Unknown syscall 369
32595 ugetrlimit(9,411044936,412226752,38,411044960,9) = 0
32595 Unknown syscall 369
32595 setrlimit(9,411044936,5000000,0,411044960,9) = 0
32595 Unknown syscall 369
32595 ugetrlimit(3,411044936,412226752,38,411044960,3) = 0
32595 Unknown syscall 369
32595 setrlimit(3,411044936,5000000,0,411044960,3) = 0
32595 mmap2(NULL,5001216,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0xfffffff4

*** this is where we deadlock otherwise ***

32595 exit_group(1)


Alex
Fabio Erculiani - Jan. 15, 2012, 6:14 p.m.
I confirm the patch fixes the deadlock I was seeing here.
Peter Maydell - Feb. 21, 2012, 4:11 p.m.
On 13 January 2012 16:21, Alexander Graf <agraf@suse.de> wrote:
> On 13.01.2012, at 17:16, Peter Maydell wrote:
>> On 13 January 2012 15:46, Alexander Graf <agraf@suse.de> wrote:
>>> This patch forces unlocking of that lock in the segv handler. I'm not sure
>>> this is the right approach though. Maybe we should rather make sure we don't
>>> segfault in the code? I would greatly appreciate someone more intelligible
>>> than me to look at this :).
>>
>> A segfault while we're walking the TB chains in QEMU C code?
>> That's just a bug (and we know we have one there) -- we should
>> fix it rather than papering over it like this.
>
> Well, we're segfaulting in this exact special case which calls setrlimit() before an mmap which fails:

So what's actually happening here is not a problem with the failing
mmap requested by the guest, but with one we do ourselves slightly
later. The guest attempts to do a call to some code that hasn't been
translated yet (as it happens it's an attempt to call the tls function
in the ARM commpage at 0xffff0fe0, but that's not particularly
important). tb_gen_code() generates the code, and then calls
tb_link_page() to add it to our data structures. tb_link_page calls
tb_alloc_page() which calls page_find_alloc() which then needs
to allocate a level 2 page table entry in the l1_map[]. Unfortunately
(a) the mmap() it uses to do this fails because of the rlimit
(b) we don't check the mmap() return value, so we proceed on
with the bogus -1 and quickly segfault.

So "unlock and carry on regardless" is definitely wrong. We could
do the same as the system-mode (which is using g_malloc0) and
abort when the mmap() fails, although that won't help your test
program much I suspect.

For a proper fix we probably need to handle set/getrlimit for
RLIMIT_AS specially so we can apply them ourselves to the guest's
mmap/brk usage and don't get spurious allocation failures of
our private data structures...

-- PMM
Alexander Graf - Feb. 21, 2012, 4:19 p.m.
On 02/21/2012 05:11 PM, Peter Maydell wrote:
> On 13 January 2012 16:21, Alexander Graf<agraf@suse.de>  wrote:
>> On 13.01.2012, at 17:16, Peter Maydell wrote:
>>> On 13 January 2012 15:46, Alexander Graf<agraf@suse.de>  wrote:
>>>> This patch forces unlocking of that lock in the segv handler. I'm not sure
>>>> this is the right approach though. Maybe we should rather make sure we don't
>>>> segfault in the code? I would greatly appreciate someone more intelligible
>>>> than me to look at this :).
>>> A segfault while we're walking the TB chains in QEMU C code?
>>> That's just a bug (and we know we have one there) -- we should
>>> fix it rather than papering over it like this.
>> Well, we're segfaulting in this exact special case which calls setrlimit() before an mmap which fails:
> So what's actually happening here is not a problem with the failing
> mmap requested by the guest, but with one we do ourselves slightly
> later. The guest attempts to do a call to some code that hasn't been
> translated yet (as it happens it's an attempt to call the tls function
> in the ARM commpage at 0xffff0fe0, but that's not particularly
> important). tb_gen_code() generates the code, and then calls
> tb_link_page() to add it to our data structures. tb_link_page calls
> tb_alloc_page() which calls page_find_alloc() which then needs
> to allocate a level 2 page table entry in the l1_map[]. Unfortunately
> (a) the mmap() it uses to do this fails because of the rlimit
> (b) we don't check the mmap() return value, so we proceed on
> with the bogus -1 and quickly segfault.
>
> So "unlock and carry on regardless" is definitely wrong. We could
> do the same as the system-mode (which is using g_malloc0) and
> abort when the mmap() fails, although that won't help your test
> program much I suspect.
>
> For a proper fix we probably need to handle set/getrlimit for
> RLIMIT_AS specially so we can apply them ourselves to the guest's
> mmap/brk usage and don't get spurious allocation failures of
> our private data structures...

Or we preallocate structures we need in those critical paths. Or we 
manage an allocation failure by not linking pages.

Alex
Peter Maydell - Feb. 21, 2012, 4:21 p.m.
On 21 February 2012 16:19, Alexander Graf <agraf@suse.de> wrote:
> On 02/21/2012 05:11 PM, Peter Maydell wrote:
>> For a proper fix we probably need to handle set/getrlimit for
>> RLIMIT_AS specially so we can apply them ourselves to the guest's
>> mmap/brk usage and don't get spurious allocation failures of
>> our private data structures...
>
> Or we preallocate structures we need in those critical paths. Or we manage
> an allocation failure by not linking pages.

Yeah. Feel free to submit patches :-)

-- PMM

Patch

diff --git a/qemu-lock.h b/qemu-lock.h
index a72edda..e460e12 100644
--- a/qemu-lock.h
+++ b/qemu-lock.h
@@ -24,6 +24,12 @@ 
 #include <pthread.h>
 #define spin_lock pthread_mutex_lock
 #define spin_unlock pthread_mutex_unlock
+static inline void spin_unlock_safe(pthread_mutex_t *lock)
+{
+    /* unlocking an unlocked mutex results in undefined behavior */
+    pthread_mutex_trylock(lock);
+    pthread_mutex_unlock(lock);
+}
 #define spinlock_t pthread_mutex_t
 #define SPIN_LOCK_UNLOCKED PTHREAD_MUTEX_INITIALIZER
 
@@ -46,4 +52,8 @@  static inline void spin_unlock(spinlock_t *lock)
 {
 }
 
+static inline void spin_unlock_safe(spinlock_t *lock)
+{
+}
+
 #endif
diff --git a/user-exec.c b/user-exec.c
index abf6885..2826bd1 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -96,6 +96,10 @@  static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
     qemu_printf("qemu: SIGSEGV pc=0x%08lx address=%08lx w=%d oldset=0x%08lx\n",
                 pc, address, is_write, *(unsigned long *)old_set);
 #endif
+
+    /* Maybe we're still holding the TB fiddling lock? */
+    spin_unlock_safe(&tb_lock);
+
     /* XXX: locking issue */
     if (is_write && page_unprotect(h2g(address), pc, puc)) {
         return 1;