From patchwork Fri Feb 22 03:03:56 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Korolev X-Patchwork-Id: 222448 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id E88CF2C02A2 for ; Fri, 22 Feb 2013 14:04:19 +1100 (EST) Received: from localhost ([::1]:48616 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U8ivp-0008DO-NU for incoming@patchwork.ozlabs.org; Thu, 21 Feb 2013 22:04:17 -0500 Received: from eggs.gnu.org ([208.118.235.92]:45640) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U8ivd-0008BV-P9 for qemu-devel@nongnu.org; Thu, 21 Feb 2013 22:04:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U8ivV-0004eq-SK for qemu-devel@nongnu.org; Thu, 21 Feb 2013 22:04:05 -0500 Received: from mail-qa0-f44.google.com ([209.85.216.44]:43508) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U8ivV-0004ef-NC for qemu-devel@nongnu.org; Thu, 21 Feb 2013 22:03:57 -0500 Received: by mail-qa0-f44.google.com with SMTP id bv4so207265qab.3 for ; Thu, 21 Feb 2013 19:03:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=yCikNNzqphpP6tmxxVi3BS1BNy4Q7r5R9UV2CvN+M48=; b=RDI3CX/NjHNqgcXQH9df0i03hiz6TyiIwdiQ5x4vh6HAqKOUSY54dO/anRuhSnhPdV qEggIYoDk8QEmRa55Ori7I1y8I8scPbVSVDZnCOs+8gi3yrhesehfgJlfQ4E7xSMjjmv LO2I/MJ2zGloPJILIT7hbMrLF7JVzuDdL+HM5S/KDxmEI84//F8o8sAUo1cf1VgLJjEl 2piiD9aFEQenPA+1nqkaJIpn4gK3Zhm6gnFmfgMqCHVl1PjWBaW9jwFtSacSF5jhBwuY hpNinbbyCI0tx16zHlDkjMmzqhq1DDyOc2ax2E0bS3+sMpNqOLcRS2FArAOZYbPBz/JC oM2g== MIME-Version: 1.0 X-Received: by 10.49.127.101 with SMTP id nf5mr176351qeb.20.1361502236851; Thu, 21 Feb 2013 19:03:56 -0800 (PST) Received: by 10.49.71.239 with HTTP; Thu, 21 Feb 2013 19:03:56 -0800 (PST) In-Reply-To: <20130219152057.GA30086@redhat.com> References: <511B1F5D.4060906@endace.com> <20130219152057.GA30086@redhat.com> Date: Fri, 22 Feb 2013 16:03:56 +1300 Message-ID: From: Alexey Korolev To: "Michael S. Tsirkin" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 209.85.216.44 Cc: Peter Maydell , gleb@redhat.com, jan.kiszka@siemens.com, "qemu-devel@nongnu.org" , Gerd Hoffmann , Alexey Korolev Subject: Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Wed, Feb 20, 2013 at 4:20 AM, Michael S. Tsirkin wrote: > apic overlaps PCI space. On real hardware it has > higher priority, emulate this correctly. > > This should addresses the following issue: > >> Subject: Re: [BUG] Guest OS hangs on boot when 64bit BAR present (kvm-apic-msi resource conflict) >> Sometime ago I reported an issue about guest OS hang when 64bit BAR present. >> http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03189.html >> http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg00413.html >> >> Some more investigation has been done, so in this post I'll try to explain why it happens and offer possible solutions: >> >> *When the issue happens* >> The issue occurs on Linux guest OS if kernel version <2.6.36 >> A Guest OS hangs on boot when a 64bit PCI BAR is present in a system (if we use ivshmem driver for example) and occupies range within first >> 4 GB. >> >> *How to reproduce* >> I used the following qemu command to reproduce the case: >> /usr/local/bin/qemu-system-x86_64 -M pc-1.3 -enable-kvm -m 2000 -smp 1,sockets=1,cores=1,threads=1 -name Rh5332 -chardev >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/Rh5332.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc >> base=utc -boot cd -drive file=/home/akorolev/rh5332.img,if=none,id=drive-ide0-0-0,format=raw -device >> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev file,id=charserial0,path=/home/akorolev/serial.log -device >> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device ivshmem,shm,size=32M-device >> virtio-balloon-pci,id=balloon0 >> >> Tried different guests: Centos 5.8 64bit, RHEL 5.3 32bit, FC 12 64bit on all machines hang occurs in 100% cases >> >> *Why it happens* >> The issue basically comes from Linux PCI enumeration code. >> >> The OS enumerates 64BIT bars when device is enabled using the following procedure. >> 1. Write all FF's to lower half of 64bit BAR >> 2. Write address back to lower half of 64bit BAR >> 3. Write all FF's to higher half of 64bit BAR >> 4. Write address back to higher half of 64bit BAR >> >> For qemu it means that qemu pci_default_write_config() recevies all FFs for lower part of the 64bit BAR. >> Then it applies the mask and converts the value to "All FF's - size + 1" (FE000000 if size is 32MB). >> >> So for short period of time the range [0xFE000000 - 0xFFFFFFFF] will be occupied by ivshmem resource. >> For some reason it is lethal for further boot process. >> >> We have found that boot process screws up completely if kvm-apic-msi range is overlapped even for short period of time. (We still don't >> know why it happens, hope that the qemu maintainers can answer?) >> >> If we look at kvm-apic-msi memory region it is a non-overlapable memory region with hardcoded address range [0xFEE00000 - 0xFEF00000]. >> >> Here is a log we collected from render_memory_regions: >> >> system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff] >> kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000] >> pc.ram overlap 0 pri 0 [0xca000 - 0xcd000] >> ++ pc.ram [0xca000 - 0xcd000] is added to view >> .................... >> smram-region overlap 1 pri 1 [0xa0000 - 0xc0000] >> pci overlap 0 pri 0 [0xa0000 - 0xc0000] >> cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000] >> cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000] >> ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view >> kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000] >> ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view >> pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000] >> pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000] >> pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000] >> pci overlap 0 pri 0 [0x7d000000 - 0x100000000] >> ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 - 0x100000000] >> ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000] >> ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view >> ++ivshmem.bar2 [0xfec01000 - 0x100000000] is added to view >> ivshmem-mmio overlap 1 pri 1 [0xfebf1000 - 0xfebf1100] >> e1000-mmio overlap 1 pri 1 [0xfeba0000 - 0xfebc0000] >> cirrus-mmio overlap 1 pri 1 [0xfebf0000 - 0xfebf1000] >> cirrus-pci-bar0 overlap 1 pri 1 [0xfa000000 - 0xfc000000] >> vga.vram overlap 1 pri 1 [0xfa000000 - 0xfa800000] >> ++vga.vram [0xfa000000 - 0xfa800000] is added to view >> cirrus-bitblt-mmio overlap 0 pri 0 [0xfb000000 - 0xfb400000] >> ++cirrus-bitblt-mmio [0xfb000000 - 0xfb400000] is added to view >> cirrus-linear-io overlap 0 pri 0 [0xfa000000 - 0xfa800000] >> pc.bios overlap 0 pri 0 [0xfffe0000 - 0x100000000] >> ram-below-4g overlap 0 pri 0 [0x0 - 0x7d000000] >> pc.ram overlap 0 pri 0 [0x0 - 0x7d000000] >> ++pc.ram [0x0 - 0xa0000] is added to view >> ++pc.ram [0x100000 - 0x7d000000] is added to view >> kvm-apic-msi overlap 0 pri 0 [0xfee00000 - 0xfef00000] >> >> As you can see from log the kvm-apic-msi is enumarated last when range [0xfee00000 - 0xfef00000] is already occupied by ivshmem.bar2 >> [0xfec01000 - 0x100000000]. >> >> >> *Possible solutions* >> Solution 1. Probably the best would be adding the rule that regions which may not be overlapped are added to view first (In in other words >> regions which must not be overlapped have the highest priority). Please find patch in the following message. >> >> Solution 2. Raise priority of kvm-apic-msi resource. This is a bit misleading solution, as priority is only applicable for overlap-able >> regions, but this region must not be overlapped. >> >> Solution 3. Fix the issue at PCI level. Track if the resource is 64bit and apply changes if both parts of 64bit BAR are programmed. (It >> appears that real PCI bus controllers are smart enough to track 64bit BAR writes on PC, so qemu could do the same? Drawbacks are that >> tracking PCI writes is bit cumbersome, and such tracking may appear to somebody as a hack) > > --- > > The following patch is under test ATM. > Alexey, does it address the crash for you? > > > diff --git a/hw/pc.c b/hw/pc.c > index 53cc173..745c89d 100644 > --- a/hw/pc.c > +++ b/hw/pc.c > @@ -1154,7 +1154,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name) > } > qdev_init_nofail(dev); > d = SYS_BUS_DEVICE(dev); > - sysbus_mmio_map(d, 0, 0xfec00000); > + /* APIC overlaps the PCI window. */ > + sysbus_mmio_map_overlap(d, 0, 0xfec00000, 1000); > > for (i = 0; i < IOAPIC_NUM_PINS; i++) { > gsi_state->ioapic_irq[i] = qdev_get_gpio_in(dev, i); > diff --git a/hw/sysbus.c b/hw/sysbus.c > index 6d9d1df..40bf352 100644 > --- a/hw/sysbus.c > +++ b/hw/sysbus.c > @@ -66,6 +66,26 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr) > dev->mmio[n].memory); > } > > +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr, > + unsigned priority) > +{ > + assert(n >= 0 && n < dev->num_mmio); > + > + if (dev->mmio[n].addr == addr) { > + /* ??? region already mapped here. */ > + return; > + } > + if (dev->mmio[n].addr != (hwaddr)-1) { > + /* Unregister previous mapping. */ > + memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory); > + } > + dev->mmio[n].addr = addr; > + memory_region_add_subregion_overlap(get_system_memory(), > + addr, > + dev->mmio[n].memory, > + priority); > +} > + > > /* Request an IRQ source. The actual IRQ object may be populated later. */ > void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p) > diff --git a/hw/sysbus.h b/hw/sysbus.h > index a7fcded..2100bd7 100644 > --- a/hw/sysbus.h > +++ b/hw/sysbus.h > @@ -56,6 +56,8 @@ void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t ioport, pio_addr_t size); > > void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq); > void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr); > +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr, > + unsigned priority); > void sysbus_add_memory(SysBusDevice *dev, hwaddr addr, > MemoryRegion *mem); > void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr, > Sorry for late response. It looks I replied just to Michael by occasion. This patch doesn't fix the problem as KVM APIC should have high priority (IO APIC is fine). This patch fixes the issue: (I can send it in a new thread to make apply process clear) -------- void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr, diff --git a/target-i386/cpu.c b/target-i386/cpu.c index dfcf86e..e66bf54 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2078,7 +2078,7 @@ static void x86_cpu_apic_init(X86CPU *cpu, Error **errp) /* NOTE: the APIC is directly connected to the CPU - it is not on the global memory bus. */ /* XXX: what if the base changes? */ - sysbus_mmio_map(SYS_BUS_DEVICE(env->apic_state), 0, MSI_ADDR_BASE); + sysbus_mmio_map_overlap(SYS_BUS_DEVICE(env->apic_state), 0, MSI_ADDR_BASE, 0x1000); apic_mapped = 1; } } diff --git a/hw/sysbus.c b/hw/sysbus.c index 6d9d1df..50c7232 100644 --- a/hw/sysbus.c +++ b/hw/sysbus.c @@ -48,7 +48,8 @@ void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq) } } -void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr) +static void sysbus_mmio_map_common(SysBusDevice *dev, int n, hwaddr addr, + bool may_overlap, unsigned priority) { assert(n >= 0 && n < dev->num_mmio); @@ -61,11 +62,29 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr) memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory); } dev->mmio[n].addr = addr; - memory_region_add_subregion(get_system_memory(), - addr, - dev->mmio[n].memory); + if (may_overlap) { + memory_region_add_subregion_overlap(get_system_memory(), + addr, + dev->mmio[n].memory, + priority); + } + else { + memory_region_add_subregion(get_system_memory(), + addr, + dev->mmio[n].memory); + } } +void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr) +{ + sysbus_mmio_map_common(dev, n, addr, false, 0); +} + +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr, + unsigned priority) +{ + sysbus_mmio_map_common(dev, n, addr, true, priority); +} /* Request an IRQ source. The actual IRQ object may be populated later. */ void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p) diff --git a/hw/sysbus.h b/hw/sysbus.h index a7fcded..2100bd7 100644 --- a/hw/sysbus.h +++ b/hw/sysbus.h @@ -56,6 +56,8 @@ void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t ioport, pio_addr_t size); void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq); void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr); +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr, + unsigned priority); void sysbus_add_memory(SysBusDevice *dev, hwaddr addr, MemoryRegion *mem);