diff mbox

VFIO-VGA Issue

Message ID 1365548028.16420.180.camel@bling.home
State New
Headers show

Commit Message

Alex Williamson April 9, 2013, 10:53 p.m. UTC
On Tue, 2013-04-09 at 18:33 -0400, deniv@lavabit.com wrote:
> Here's debug output from qemu, the last lines repeat indefinitely:
> 
> vfio: vfio_initfn(0000:01:00.0) group 1
> vfio: region_add 0 - 7fffffff [0x7f605fe00000]
> vfio: SKIPPING region_add fec00000 - fec00fff
> vfio: SKIPPING region_add fed00000 - fed003ff
> vfio: SKIPPING region_add fee00000 - feefffff
> vfio: SKIPPING region_add fffe0000 - ffffffff
> vfio: Device 0000:01:00.0 flags: 3, regions: 9, irgs: 3
> vfio: Device 0000:01:00.0 region 0:
> vfio:   size: 0x10000000, offset: 0x0, flags: 0x7
> vfio: Device 0000:01:00.0 region 1:
> vfio:   size: 0x0, offset: 0x10000000000, flags: 0x0
> vfio: Device 0000:01:00.0 region 2:
> vfio:   size: 0x40000, offset: 0x20000000000, flags: 0x7
> vfio: Device 0000:01:00.0 region 3:
> vfio:   size: 0x0, offset: 0x30000000000, flags: 0x0
> vfio: Device 0000:01:00.0 region 4:
> vfio:   size: 0x100, offset: 0x40000000000, flags: 0x3
> vfio: Device 0000:01:00.0 region 5:
> vfio:   size: 0x0, offset: 0x50000000000, flags: 0x0
> vfio: Device 0000:01:00.0 ROM:
> vfio:   size: 0x20000, offset: 0x60000000000, flags: 0x1
> vfio: Device 0000:01:00.0 config:
> vfio:   size: 0x1000, offset: 0x70000000000, flags: 0x3
> vfio: vfio_load_rom(0000:01:00.0)
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x4010, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0xe000000c
> vfio: Enabled ATI/AMD quirk 0x4010 for device 0000:01:00.0
> vfio: Enabled ATI/AMD quirk 0x3c3 for device 0000:01:00.0
...
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0

This is a quirk that I haven't fully figured out yet.  ATI/AMD cards use
VGA register 0x3c3 to read upper byte of the address of the I/O port
BAR, but sometimes it reads 0.  Try the patch below to have it always
return the virtual BAR address and let me know if it works.  Thanks,

Alex

Comments

deniv@lavabit.com April 10, 2013, 12:02 a.m. UTC | #1
> On Tue, 2013-04-09 at 18:33 -0400, deniv@lavabit.com wrote:
>> Here's debug output from qemu, the last lines repeat indefinitely:
>>
>> vfio: vfio_initfn(0000:01:00.0) group 1
>> vfio: region_add 0 - 7fffffff [0x7f605fe00000]
>> vfio: SKIPPING region_add fec00000 - fec00fff
>> vfio: SKIPPING region_add fed00000 - fed003ff
>> vfio: SKIPPING region_add fee00000 - feefffff
>> vfio: SKIPPING region_add fffe0000 - ffffffff
>> vfio: Device 0000:01:00.0 flags: 3, regions: 9, irgs: 3
>> vfio: Device 0000:01:00.0 region 0:
>> vfio:   size: 0x10000000, offset: 0x0, flags: 0x7
>> vfio: Device 0000:01:00.0 region 1:
>> vfio:   size: 0x0, offset: 0x10000000000, flags: 0x0
>> vfio: Device 0000:01:00.0 region 2:
>> vfio:   size: 0x40000, offset: 0x20000000000, flags: 0x7
>> vfio: Device 0000:01:00.0 region 3:
>> vfio:   size: 0x0, offset: 0x30000000000, flags: 0x0
>> vfio: Device 0000:01:00.0 region 4:
>> vfio:   size: 0x100, offset: 0x40000000000, flags: 0x3
>> vfio: Device 0000:01:00.0 region 5:
>> vfio:   size: 0x0, offset: 0x50000000000, flags: 0x0
>> vfio: Device 0000:01:00.0 ROM:
>> vfio:   size: 0x20000, offset: 0x60000000000, flags: 0x1
>> vfio: Device 0000:01:00.0 config:
>> vfio:   size: 0x1000, offset: 0x70000000000, flags: 0x3
>> vfio: vfio_load_rom(0000:01:00.0)
>> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x4010, 4)
>> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0xe000000c
>> vfio: Enabled ATI/AMD quirk 0x4010 for device 0000:01:00.0
>> vfio: Enabled ATI/AMD quirk 0x3c3 for device 0000:01:00.0
> ...
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>
> This is a quirk that I haven't fully figured out yet.  ATI/AMD cards use
> VGA register 0x3c3 to read upper byte of the address of the I/O port
> BAR, but sometimes it reads 0.  Try the patch below to have it always
> return the virtual BAR address and let me know if it works.  Thanks,
>
> Alex
>
> --- a/hw/vfio_pci.c
> +++ b/hw/vfio_pci.c
> @@ -1117,7 +1117,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void
> *opaque,
>      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
>                                    addr + 0x3, size);
>
> -    if (data == quirk->data) {
> +    if (1 || data == quirk->data) {
>          data = pci_get_byte(pdev->config + PCI_BASE_ADDRESS_4 + 1);
>          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
>      }
>
>

Although with this patch I get much further, monitor still doesn't sync. I
also see some effects on my host GPU (Intel HD4000). Starting qemu ruins
colors (black turns green, blue turns purple, etc). Though switching to
linux console and back to Xorg fixes it.

Debug output this time is huge with most of the lines being variations of
those below. On line 58515 qemu hangs again, and the last two blocks from
this snip repeat indefinitely:
...
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x1730, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffb808, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x80
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffbc08, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x200
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x1730, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffbc08, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x200
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffa008, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x1730, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
vfio: vfio_vga_write(0x3c4, 0x1, 1)
vfio: vfio_vga_read(0x3c5, 1) = 0x0
vfio: vfio_vga_write(0x3c5, 0x20, 1)
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffa2d4, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x0
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x4, 0x1, 4)
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x4, 0x101, 4)
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e70, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10410311
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e8c, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10029
vfio: vfio_vga_read(0x3c3, 1) = 0x0
vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e8c, 4)
vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10029
Alex Williamson April 10, 2013, 3:37 p.m. UTC | #2
On Tue, 2013-04-09 at 20:02 -0400, deniv@lavabit.com wrote:
> > On Tue, 2013-04-09 at 18:33 -0400, deniv@lavabit.com wrote:
> >> Here's debug output from qemu, the last lines repeat indefinitely:
> >>
> >> vfio: vfio_initfn(0000:01:00.0) group 1
> >> vfio: region_add 0 - 7fffffff [0x7f605fe00000]
> >> vfio: SKIPPING region_add fec00000 - fec00fff
> >> vfio: SKIPPING region_add fed00000 - fed003ff
> >> vfio: SKIPPING region_add fee00000 - feefffff
> >> vfio: SKIPPING region_add fffe0000 - ffffffff
> >> vfio: Device 0000:01:00.0 flags: 3, regions: 9, irgs: 3
> >> vfio: Device 0000:01:00.0 region 0:
> >> vfio:   size: 0x10000000, offset: 0x0, flags: 0x7
> >> vfio: Device 0000:01:00.0 region 1:
> >> vfio:   size: 0x0, offset: 0x10000000000, flags: 0x0
> >> vfio: Device 0000:01:00.0 region 2:
> >> vfio:   size: 0x40000, offset: 0x20000000000, flags: 0x7
> >> vfio: Device 0000:01:00.0 region 3:
> >> vfio:   size: 0x0, offset: 0x30000000000, flags: 0x0
> >> vfio: Device 0000:01:00.0 region 4:
> >> vfio:   size: 0x100, offset: 0x40000000000, flags: 0x3
> >> vfio: Device 0000:01:00.0 region 5:
> >> vfio:   size: 0x0, offset: 0x50000000000, flags: 0x0
> >> vfio: Device 0000:01:00.0 ROM:
> >> vfio:   size: 0x20000, offset: 0x60000000000, flags: 0x1
> >> vfio: Device 0000:01:00.0 config:
> >> vfio:   size: 0x1000, offset: 0x70000000000, flags: 0x3
> >> vfio: vfio_load_rom(0000:01:00.0)
> >> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x4010, 4)
> >> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0xe000000c
> >> vfio: Enabled ATI/AMD quirk 0x4010 for device 0000:01:00.0
> >> vfio: Enabled ATI/AMD quirk 0x3c3 for device 0000:01:00.0
> > ...
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >
> > This is a quirk that I haven't fully figured out yet.  ATI/AMD cards use
> > VGA register 0x3c3 to read upper byte of the address of the I/O port
> > BAR, but sometimes it reads 0.  Try the patch below to have it always
> > return the virtual BAR address and let me know if it works.  Thanks,
> >
> > Alex
> >
> > --- a/hw/vfio_pci.c
> > +++ b/hw/vfio_pci.c
> > @@ -1117,7 +1117,7 @@ static uint64_t vfio_ati_3c3_quirk_read(void
> > *opaque,
> >      uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
> >                                    addr + 0x3, size);
> >
> > -    if (data == quirk->data) {
> > +    if (1 || data == quirk->data) {
> >          data = pci_get_byte(pdev->config + PCI_BASE_ADDRESS_4 + 1);
> >          DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
> >      }
> >
> >
> 
> Although with this patch I get much further, monitor still doesn't sync. I
> also see some effects on my host GPU (Intel HD4000). Starting qemu ruins
> colors (black turns green, blue turns purple, etc). Though switching to
> linux console and back to Xorg fixes it.

Hmm, seems like that would imply the VGA arbitration isn't working.
When I test, I'm not running anything on the boot graphics device, it's
sitting at a vt login.

> Debug output this time is huge with most of the lines being variations of
> those below. On line 58515 qemu hangs again, and the last two blocks from
> this snip repeat indefinitely:

Was this with or without KVM acceleration?  Is it hung or does the code
below repeat indefinitely?  It can take a long time to get something
drawn to the screen with all this debug output, but the monitor should
get signal pretty quickly.  Check dmesg and look for errors reading the
ROM file, starting qemu w/ a ROM isn't going to get very far.

> ...
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x1730, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffb808, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x80
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffbc08, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x200
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x1730, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffbc08, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x200
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffa008, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x1730, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
> vfio: vfio_vga_write(0x3c4, 0x1, 1)
> vfio: vfio_vga_read(0x3c5, 1) = 0x0
> vfio: vfio_vga_write(0x3c5, 0x20, 1)
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0xbfffa2d4, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x0
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x4, 0x1, 4)
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x1
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e74, 4)
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x4, 0x101, 4)
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e70, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10410311
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e8c, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10029
> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e8c, 4)
> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10029

Do you mean the above 5 lines are repeated or the whole sequence?  It
seems to be making some progress over the whole log, but I couldn't tell
you what it's doing.  Reading 0x3c3 to get the BAR3 address, then using
it to do some kind of access is pretty typical behavior.  Things I would
try - If the HD4000 is not the bare metal boot VGA, make it so.
Blacklist the radeon driver in the host so that you give the card to
qemu in a "fresh" state.  Thanks,

Alex
deniv@lavabit.com April 10, 2013, 5:11 p.m. UTC | #3
>> Although with this patch I get much further, monitor still doesn't sync.
>> I
>> also see some effects on my host GPU (Intel HD4000). Starting qemu ruins
>> colors (black turns green, blue turns purple, etc). Though switching to
>> linux console and back to Xorg fixes it.
>
> Hmm, seems like that would imply the VGA arbitration isn't working.
> When I test, I'm not running anything on the boot graphics device, it's
> sitting at a vt login.
I tried running from a vt (Xorg wasn't running). The colors were fine,
passthrough didn't work.

>> Debug output this time is huge with most of the lines being variations
>> of
>> those below. On line 58515 qemu hangs again, and the last two blocks
>> from
>> this snip repeat indefinitely:
>
> Was this with or without KVM acceleration?
I've been testing without KVM acceleration since you told it can be buggy.

> Is it hung or does the code below repeat indefinitely?
The last five lines from the code below repeat indefinitely. The files
which I used to pipe debug output grew in size to 500Mb+ in no time.

> It can take a long time to get something
> drawn to the screen with all this debug output, but the monitor should
> get signal pretty quickly.
This monitor is connected via VGA cable, which is slow to change
resolutions and such, so I couldn't see it even if there was a short timed
signal before qemu hung (looped?). Also, waiting longer didn't give
anything new.

> Check dmesg and look for errors reading the
> ROM file, starting qemu w/ a ROM isn't going to get very far.
There are only two lines produced on each qemu start:
[   95.331196] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[   95.354793] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270

>> vfio: vfio_vga_read(0x3c3, 1) = 0x0
>> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
>> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
>> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e8c, 4)
>> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10029
>
> Do you mean the above 5 lines are repeated or the whole sequence?
I mean those 5 lines are repeated.

> It seems to be making some progress over the whole log, but I couldn't tell
> you what it's doing.  Reading 0x3c3 to get the BAR3 address, then using
> it to do some kind of access is pretty typical behavior.  Things I would
> try - If the HD4000 is not the bare metal boot VGA, make it so.
Do you mean I should boot into linux console with VESA driver and run qemu
without starting Xorg? If starting qemu without Xorg in KMS console on
HD4000 is fine, I did that.

> Blacklist the radeon driver in the host so that you give the card to
> qemu in a "fresh" state.  Thanks,
>
> Alex
The radeon driver isn't installed. There's nothing to blacklist. kernel's
.config:
# CONFIG_DRM_RADEON is not set
lspci -v shows no "Kernel driver in use:" for the graphic card.


On a semi-releated note, today I tried passing through this card in Xen.
With "gfx_passthru=1" Xen stopped the same way as qemu (one core was fully
busy, with nothing on the monitor). However, turning gfx_passthru off did
the trick. Win7 started loading with cirrus and switched to HD7750 halfway
through boot proccess. I didn't do any testing just let Windows calculate
its score. The result was 7.4 and Aero was working.

Also, in Xen's dmesg I saw these lines:
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.
Can those "not enabled" features be the cause of my issues? Light search
in google shows that many people have Snoop Control enabled. And one more
question, could the graphic card vendor (ASUS) make any changes that
aren't present in the reference AMD design, but can make it hard to pass
through the card?
Alex Williamson April 10, 2013, 6:30 p.m. UTC | #4
On Wed, 2013-04-10 at 13:11 -0400, deniv@lavabit.com wrote:
> >> Although with this patch I get much further, monitor still doesn't sync.
> >> I
> >> also see some effects on my host GPU (Intel HD4000). Starting qemu ruins
> >> colors (black turns green, blue turns purple, etc). Though switching to
> >> linux console and back to Xorg fixes it.
> >
> > Hmm, seems like that would imply the VGA arbitration isn't working.
> > When I test, I'm not running anything on the boot graphics device, it's
> > sitting at a vt login.
> I tried running from a vt (Xorg wasn't running). The colors were fine,
> passthrough didn't work.

Ok, maybe Xorg is making use of vga space, but not using the VGA
arbiter.  That would be unfortunate.

> >> Debug output this time is huge with most of the lines being variations
> >> of
> >> those below. On line 58515 qemu hangs again, and the last two blocks
> >> from
> >> this snip repeat indefinitely:
> >
> > Was this with or without KVM acceleration?
> I've been testing without KVM acceleration since you told it can be buggy.
> 
> > Is it hung or does the code below repeat indefinitely?
> The last five lines from the code below repeat indefinitely. The files
> which I used to pipe debug output grew in size to 500Mb+ in no time.
> 
> > It can take a long time to get something
> > drawn to the screen with all this debug output, but the monitor should
> > get signal pretty quickly.
> This monitor is connected via VGA cable, which is slow to change
> resolutions and such, so I couldn't see it even if there was a short timed
> signal before qemu hung (looped?). Also, waiting longer didn't give
> anything new.
> 
> > Check dmesg and look for errors reading the
> > ROM file, starting qemu w/ a ROM isn't going to get very far.
> There are only two lines produced on each qemu start:
> [   95.331196] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
> [   95.354793] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270
> 
> >> vfio: vfio_vga_read(0x3c3, 1) = 0x0
> >> vfio: vfio_ati_3c3_quirk_read(0x3c3, 1) = 0xc0
> >> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4c, 4) = 0x0
> >> vfio: vfio_bar_write(0000:01:00.0:BAR4+0x0, 0x6e8c, 4)
> >> vfio: vfio_bar_read(0000:01:00.0:BAR4+0x4, 4) = 0x10029
> >
> > Do you mean the above 5 lines are repeated or the whole sequence?
> I mean those 5 lines are repeated.

We can really only guess what that sequence means, I don't know of any
documentation or code to look at for it.

> > It seems to be making some progress over the whole log, but I couldn't tell
> > you what it's doing.  Reading 0x3c3 to get the BAR3 address, then using
> > it to do some kind of access is pretty typical behavior.  Things I would
> > try - If the HD4000 is not the bare metal boot VGA, make it so.
> Do you mean I should boot into linux console with VESA driver and run qemu
> without starting Xorg? If starting qemu without Xorg in KMS console on
> HD4000 is fine, I did that.

I just want to make sure nothing is using the ATI card.  If your BIOS,
bootloader, and KMS are shown on the IGD and you're not loading the
radeon FB driver (as you note below), this should be the case.

> > Blacklist the radeon driver in the host so that you give the card to
> > qemu in a "fresh" state.  Thanks,
> >
> > Alex
> The radeon driver isn't installed. There's nothing to blacklist. kernel's
> .config:
> # CONFIG_DRM_RADEON is not set
> lspci -v shows no "Kernel driver in use:" for the graphic card.
> 
> 
> On a semi-releated note, today I tried passing through this card in Xen.
> With "gfx_passthru=1" Xen stopped the same way as qemu (one core was fully
> busy, with nothing on the monitor).

It's good to know we're not alone ;)

> However, turning gfx_passthru off did
> the trick. Win7 started loading with cirrus and switched to HD7750 halfway
> through boot proccess. I didn't do any testing just let Windows calculate
> its score. The result was 7.4 and Aero was working.

You should be able to do this with vfio too, use -vga cirrus and don't
use the x-vga option on pci-assign.  The x-vga enables legacy VGA
support for boot and primary console, as a secondary head normal PCI
device assignment should be sufficient.

> Also, in Xen's dmesg I saw these lines:
> (XEN) Intel VT-d Snoop Control not enabled.
> (XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
> (XEN) Intel VT-d Queued Invalidation enabled.
> (XEN) Intel VT-d Interrupt Remapping enabled.
> (XEN) Intel VT-d Shared EPT tables not enabled.
> Can those "not enabled" features be the cause of my issues? Light search
> in google shows that many people have Snoop Control enabled.

Most of those are hardware features, I'd expect we're using the same.

>  And one more
> question, could the graphic card vendor (ASUS) make any changes that
> aren't present in the reference AMD design, but can make it hard to pass
> through the card?

I imagine each vendor has an opportunity to customize the BIOS and
things could certainly be done there to cause problems.  I'd hope that
the core device initialization and VESA extension code is mostly intact
from the reference version though.  Thanks,

Alex
deniv@lavabit.com April 10, 2013, 8:32 p.m. UTC | #5
>> However, turning gfx_passthru off did
>> the trick. Win7 started loading with cirrus and switched to HD7750
>> halfway
>> through boot proccess. I didn't do any testing just let Windows
>> calculate
>> its score. The result was 7.4 and Aero was working.
>
> You should be able to do this with vfio too, use -vga cirrus and don't
> use the x-vga option on pci-assign.  The x-vga enables legacy VGA
> support for boot and primary console, as a secondary head normal PCI
> device assignment should be sufficient.
>
Oh, how I wish it was true! Trying to load with cirrus and vfio-pci
results in BSOD:
Attemp to reset the display driver and recover from timeout failed.

Trying the old pci-assign with kvm results in non-working GFX. Windows
shows code 10 and sometimes code 43 for the card.


P.S. I'm starting to go nuts because VGA assignment doesn't work. The
system was built with this sole intention. So, now I'm considering buying
another graphic card. Can you suggest any consumer card that is easy to
pass and can leave through vm resets? This HD7750 just hung the host when
I tried destroying Xen's VM and running it again.
Alex Williamson April 10, 2013, 8:42 p.m. UTC | #6
On Wed, 2013-04-10 at 16:32 -0400, deniv@lavabit.com wrote:
> >> However, turning gfx_passthru off did
> >> the trick. Win7 started loading with cirrus and switched to HD7750
> >> halfway
> >> through boot proccess. I didn't do any testing just let Windows
> >> calculate
> >> its score. The result was 7.4 and Aero was working.
> >
> > You should be able to do this with vfio too, use -vga cirrus and don't
> > use the x-vga option on pci-assign.  The x-vga enables legacy VGA
> > support for boot and primary console, as a secondary head normal PCI
> > device assignment should be sufficient.
> >
> Oh, how I wish it was true! Trying to load with cirrus and vfio-pci
> results in BSOD:
> Attemp to reset the display driver and recover from timeout failed.

Is this a fresh windows install?  Windows doesn't like change and will
BSOD pretty easily when attaching graphics to an existing image.

> Trying the old pci-assign with kvm results in non-working GFX. Windows
> shows code 10 and sometimes code 43 for the card.

What happens if you don't use a q35 machine?  Windows is very particular
about the PCIe type of a device and will often show Code 10 if it
doesn't have a type compatible with a root complex.  Alternatively you
can use the q35 config in the docs directory with the -readconfig option
and the bus= option on the pci-assign device to place the graphics
behind a root port.

> P.S. I'm starting to go nuts because VGA assignment doesn't work. The
> system was built with this sole intention. So, now I'm considering buying
> another graphic card. Can you suggest any consumer card that is easy to
> pass and can leave through vm resets? This HD7750 just hung the host when
> I tried destroying Xen's VM and running it again.

Nope, VGA assignment is pretty bleeding edge here, but we've gotta start
somewhere.  Thanks,

Alex
deniv@lavabit.com April 11, 2013, 5:59 p.m. UTC | #7
> On Wed, 2013-04-10 at 16:32 -0400, deniv@lavabit.com wrote:
>> >> However, turning gfx_passthru off did
>> >> the trick. Win7 started loading with cirrus and switched to HD7750
>> >> halfway
>> >> through boot proccess. I didn't do any testing just let Windows
>> >> calculate
>> >> its score. The result was 7.4 and Aero was working.
>> >
>> > You should be able to do this with vfio too, use -vga cirrus and don't
>> > use the x-vga option on pci-assign.  The x-vga enables legacy VGA
>> > support for boot and primary console, as a secondary head normal PCI
>> > device assignment should be sufficient.
>> >
>> Oh, how I wish it was true! Trying to load with cirrus and vfio-pci
>> results in BSOD:
>> Attemp to reset the display driver and recover from timeout failed.
>
> Is this a fresh windows install?  Windows doesn't like change and will
> BSOD pretty easily when attaching graphics to an existing image.
>
>> Trying the old pci-assign with kvm results in non-working GFX. Windows
>> shows code 10 and sometimes code 43 for the card.
>
> What happens if you don't use a q35 machine?  Windows is very particular
> about the PCIe type of a device and will often show Code 10 if it
> doesn't have a type compatible with a root complex.  Alternatively you
> can use the q35 config in the docs directory with the -readconfig option
> and the bus= option on the pci-assign device to place the graphics
> behind a root port.
>

After many attempts I have VGA passthrough working. Each test has been
performed in a clean Win7 install with kvm enabled. Installation was done
with i440fx machine and changed to q35 after virtio drivers were
installed. I did that because there's no AHCI driver for q35 yet and win7
doesn't have drivers for lsi scsi.

1. q35/pc, vga=none, vfio-pci, x-vga=on. Nothing on the VM's screen, debug
output can be seen in the previous mails.
2. q35/pc, vga=cirrus, vfio-pci, x-vga=on. Screen corruption on host
(correction to the previous mails: corruption also happens even in kms
console). Nothing on the VM's screen.
3. q35/pc, vga=cirrus, vfio-pci, no x-vga. BSOD: Attempt to reset the
display driver and recover from timeout failed.
4. q35, vga=cirrus, pci-assign. Windows boots, the graphic card doesn't
start: Code 10.
5. pc, vga=cirrus, pci-assign. IT'S ALIVE! Ironically, this is the oldest
way to assign pci devices in kvm. Why I could make it work previously?
Silly me!

After some testing I can say that the card can tolerate guest restarts,
but it also can freeze the host in process. I got such a freeze (after a
clean shutdown) once in about 10 restarts. On the other hand, killing the
guest during a heavy 3d benchmark and starting the guest again was fine.
Also, the card fails to clock up after a reboot, which causes poor
performance in heavy loads. Thanks to Xen wiki I found that they "fix"
this by ejecting the graphic card from Windows' panel. Oh, and the last
thing, I once had code 43 after a guest reboot, but rebooting the guest
again fixed it.

All in all, I'm quite happy now. Just wish that host freeze I had won't
happen again. Is it a bug in qemu, the catalyst driver, or VT-d?
Hopefully, q35 and vfio-pci will work some day. I wonder why q35 +
pci-assign doesn't work, and what's the problem with q35/pc + vfio-pci (no
x-vga). If vfio-pci is the direct replacement for pci-assign it must be a
bug in vfio, huh?
Alex Williamson April 15, 2013, 6:48 p.m. UTC | #8
On Thu, 2013-04-11 at 13:59 -0400, deniv@lavabit.com wrote:
> > On Wed, 2013-04-10 at 16:32 -0400, deniv@lavabit.com wrote:
> >> >> However, turning gfx_passthru off did
> >> >> the trick. Win7 started loading with cirrus and switched to HD7750
> >> >> halfway
> >> >> through boot proccess. I didn't do any testing just let Windows
> >> >> calculate
> >> >> its score. The result was 7.4 and Aero was working.
> >> >
> >> > You should be able to do this with vfio too, use -vga cirrus and don't
> >> > use the x-vga option on pci-assign.  The x-vga enables legacy VGA
> >> > support for boot and primary console, as a secondary head normal PCI
> >> > device assignment should be sufficient.
> >> >
> >> Oh, how I wish it was true! Trying to load with cirrus and vfio-pci
> >> results in BSOD:
> >> Attemp to reset the display driver and recover from timeout failed.
> >
> > Is this a fresh windows install?  Windows doesn't like change and will
> > BSOD pretty easily when attaching graphics to an existing image.
> >
> >> Trying the old pci-assign with kvm results in non-working GFX. Windows
> >> shows code 10 and sometimes code 43 for the card.
> >
> > What happens if you don't use a q35 machine?  Windows is very particular
> > about the PCIe type of a device and will often show Code 10 if it
> > doesn't have a type compatible with a root complex.  Alternatively you
> > can use the q35 config in the docs directory with the -readconfig option
> > and the bus= option on the pci-assign device to place the graphics
> > behind a root port.
> >
> 
> After many attempts I have VGA passthrough working. Each test has been
> performed in a clean Win7 install with kvm enabled. Installation was done
> with i440fx machine and changed to q35 after virtio drivers were
> installed. I did that because there's no AHCI driver for q35 yet and win7
> doesn't have drivers for lsi scsi.
> 
> 1. q35/pc, vga=none, vfio-pci, x-vga=on. Nothing on the VM's screen, debug
> output can be seen in the previous mails.
> 2. q35/pc, vga=cirrus, vfio-pci, x-vga=on. Screen corruption on host
> (correction to the previous mails: corruption also happens even in kms
> console). Nothing on the VM's screen.
> 3. q35/pc, vga=cirrus, vfio-pci, no x-vga. BSOD: Attempt to reset the
> display driver and recover from timeout failed.
> 4. q35, vga=cirrus, pci-assign. Windows boots, the graphic card doesn't
> start: Code 10.
> 5. pc, vga=cirrus, pci-assign. IT'S ALIVE! Ironically, this is the oldest
> way to assign pci devices in kvm. Why I could make it work previously?
> Silly me!

What does your /sys/kernel/iommu_groups look like for the group
containing your VGA device?  I'm curious if it includes the PCIe root
port and whether you're attaching those to vfio-pci or leaving them
bound to pcieport.  If the former, that may contribute to why you're
having problems with vfio-pci.

> After some testing I can say that the card can tolerate guest restarts,
> but it also can freeze the host in process. I got such a freeze (after a
> clean shutdown) once in about 10 restarts. On the other hand, killing the
> guest during a heavy 3d benchmark and starting the guest again was fine.
> Also, the card fails to clock up after a reboot, which causes poor
> performance in heavy loads. Thanks to Xen wiki I found that they "fix"
> this by ejecting the graphic card from Windows' panel. Oh, and the last
> thing, I once had code 43 after a guest reboot, but rebooting the guest
> again fixed it.
> 
> All in all, I'm quite happy now. Just wish that host freeze I had won't
> happen again. Is it a bug in qemu, the catalyst driver, or VT-d?
> Hopefully, q35 and vfio-pci will work some day. I wonder why q35 +
> pci-assign doesn't work,

This is likely because pci-assign doesn't mangle the PCI express type to
something Windows finds acceptable on a root complex.

>  and what's the problem with q35/pc + vfio-pci (no
> x-vga). If vfio-pci is the direct replacement for pci-assign it must be a
> bug in vfio, huh?

It's either a bug in vfio or the above configuration issue with the
pcieport.  I'd appreciate if you could advise how you're configuring the
group for use.  Thanks,

Alex
deniv@lavabit.com April 25, 2013, 6:15 p.m. UTC | #9
Sorry for the long delay.

Alex Williamson:
> On Thu, 2013-04-11 at 13:59 -0400, deniv@lavabit.com wrote:
>>> On Wed, 2013-04-10 at 16:32 -0400, deniv@lavabit.com wrote:
>>>>>> However, turning gfx_passthru off did
>>>>>> the trick. Win7 started loading with cirrus and switched to HD7750
>>>>>> halfway
>>>>>> through boot proccess. I didn't do any testing just let Windows
>>>>>> calculate
>>>>>> its score. The result was 7.4 and Aero was working.
>>>>>
>>>>> You should be able to do this with vfio too, use -vga cirrus and don't
>>>>> use the x-vga option on pci-assign.  The x-vga enables legacy VGA
>>>>> support for boot and primary console, as a secondary head normal PCI
>>>>> device assignment should be sufficient.
>>>>>
>>>> Oh, how I wish it was true! Trying to load with cirrus and vfio-pci
>>>> results in BSOD:
>>>> Attemp to reset the display driver and recover from timeout failed.
>>>
>>> Is this a fresh windows install?  Windows doesn't like change and will
>>> BSOD pretty easily when attaching graphics to an existing image.
>>>
>>>> Trying the old pci-assign with kvm results in non-working GFX. Windows
>>>> shows code 10 and sometimes code 43 for the card.
>>>
>>> What happens if you don't use a q35 machine?  Windows is very particular
>>> about the PCIe type of a device and will often show Code 10 if it
>>> doesn't have a type compatible with a root complex.  Alternatively you
>>> can use the q35 config in the docs directory with the -readconfig option
>>> and the bus= option on the pci-assign device to place the graphics
>>> behind a root port.
>>>
>>
>> After many attempts I have VGA passthrough working. Each test has been
>> performed in a clean Win7 install with kvm enabled. Installation was done
>> with i440fx machine and changed to q35 after virtio drivers were
>> installed. I did that because there's no AHCI driver for q35 yet and win7
>> doesn't have drivers for lsi scsi.
>>
>> 1. q35/pc, vga=none, vfio-pci, x-vga=on. Nothing on the VM's screen, debug
>> output can be seen in the previous mails.
>> 2. q35/pc, vga=cirrus, vfio-pci, x-vga=on. Screen corruption on host
>> (correction to the previous mails: corruption also happens even in kms
>> console). Nothing on the VM's screen.
>> 3. q35/pc, vga=cirrus, vfio-pci, no x-vga. BSOD: Attempt to reset the
>> display driver and recover from timeout failed.
>> 4. q35, vga=cirrus, pci-assign. Windows boots, the graphic card doesn't
>> start: Code 10.
>> 5. pc, vga=cirrus, pci-assign. IT'S ALIVE! Ironically, this is the oldest
>> way to assign pci devices in kvm. Why I could make it work previously?
>> Silly me!
> 
> What does your /sys/kernel/iommu_groups look like for the group
> containing your VGA device?  I'm curious if it includes the PCIe root
> port and whether you're attaching those to vfio-pci or leaving them
> bound to pcieport.  If the former, that may contribute to why you're
> having problems with vfio-pci.
Yes, the group containing the VGA device includes PCIe root port. No, I
did not attach it to vfio-pci. I tried this right now, it didn't make
any difference.

For what it's worth, I also noticed errors in dmesg output when Windows
BSODs (q35, vga=cirrus, vfio-pci, no x-vga). There are about ten
thousand lines of
---
dmar: DRHD: handling fault status reg 3
dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr 1200b6000
DMAR:[fault reason 06] PTE Read access is not set
---
Fault addresses start at 11fff6000 (always the same) and go to about
1201b3000 (varies on each start).

Those read faults are followed a bunch of
---
dmar: DRHD: handling fault status reg 3
dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr 11ffed000
DMAR:[fault reason 05] PTE Write access is not set
---
Fault addresses 11fef3000-11fff0000 (the last address varies).
deniv@lavabit.com April 26, 2013, 12:02 p.m. UTC | #10
Alex Williamson:
> On Thu, 2013-04-25 at 11:38 +0000, deniv wrote:
>> Sorry for the long delay.
>>
>> Alex Williamson:
>>> On Thu, 2013-04-11 at 13:59 -0400, deniv@lavabit.com wrote:
>>>>> On Wed, 2013-04-10 at 16:32 -0400, deniv@lavabit.com wrote:
>>>>>>>> However, turning gfx_passthru off did
>>>>>>>> the trick. Win7 started loading with cirrus and switched to HD7750
>>>>>>>> halfway
>>>>>>>> through boot proccess. I didn't do any testing just let Windows
>>>>>>>> calculate
>>>>>>>> its score. The result was 7.4 and Aero was working.
>>>>>>>
>>>>>>> You should be able to do this with vfio too, use -vga cirrus and don't
>>>>>>> use the x-vga option on pci-assign.  The x-vga enables legacy VGA
>>>>>>> support for boot and primary console, as a secondary head normal PCI
>>>>>>> device assignment should be sufficient.
>>>>>>>
>>>>>> Oh, how I wish it was true! Trying to load with cirrus and vfio-pci
>>>>>> results in BSOD:
>>>>>> Attemp to reset the display driver and recover from timeout failed.
>>>>>
>>>>> Is this a fresh windows install?  Windows doesn't like change and will
>>>>> BSOD pretty easily when attaching graphics to an existing image.
>>>>>
>>>>>> Trying the old pci-assign with kvm results in non-working GFX. Windows
>>>>>> shows code 10 and sometimes code 43 for the card.
>>>>>
>>>>> What happens if you don't use a q35 machine?  Windows is very particular
>>>>> about the PCIe type of a device and will often show Code 10 if it
>>>>> doesn't have a type compatible with a root complex.  Alternatively you
>>>>> can use the q35 config in the docs directory with the -readconfig option
>>>>> and the bus= option on the pci-assign device to place the graphics
>>>>> behind a root port.
>>>>>
>>>>
>>>> After many attempts I have VGA passthrough working. Each test has been
>>>> performed in a clean Win7 install with kvm enabled. Installation was done
>>>> with i440fx machine and changed to q35 after virtio drivers were
>>>> installed. I did that because there's no AHCI driver for q35 yet and win7
>>>> doesn't have drivers for lsi scsi.
>>>>
>>>> 1. q35/pc, vga=none, vfio-pci, x-vga=on. Nothing on the VM's screen, debug
>>>> output can be seen in the previous mails.
>>>> 2. q35/pc, vga=cirrus, vfio-pci, x-vga=on. Screen corruption on host
>>>> (correction to the previous mails: corruption also happens even in kms
>>>> console). Nothing on the VM's screen.
>>>> 3. q35/pc, vga=cirrus, vfio-pci, no x-vga. BSOD: Attempt to reset the
>>>> display driver and recover from timeout failed.
>>>> 4. q35, vga=cirrus, pci-assign. Windows boots, the graphic card doesn't
>>>> start: Code 10.
>>>> 5. pc, vga=cirrus, pci-assign. IT'S ALIVE! Ironically, this is the oldest
>>>> way to assign pci devices in kvm. Why I could make it work previously?
>>>> Silly me!
>>>
>>> What does your /sys/kernel/iommu_groups look like for the group
>>> containing your VGA device?  I'm curious if it includes the PCIe root
>>> port and whether you're attaching those to vfio-pci or leaving them
>>> bound to pcieport.  If the former, that may contribute to why you're
>>> having problems with vfio-pci.
>> Yes, the group containing the VGA device includes PCIe root port. No, I
>> did not attach it to vfio-pci. I tried this right now, it didn't make
>> any difference.
> 
> You were right by not attaching it to vfio-pci.  Previous versions of
> vfio-pci still required it, but we've since found that causes more
> problems than it solves.  I wanted to make sure it wasn't attached.
> 
>> For what it's worth, I also noticed errors in dmesg output when Windows
>> BSODs (q35, vga=cirrus, vfio-pci, no x-vga). There are about ten
>> thousand lines of
>> ---
>> dmar: DRHD: handling fault status reg 3
>> dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr 1200b6000
>> DMAR:[fault reason 06] PTE Read access is not set
>> ---
>> Fault addresses start at 11fff6000 (always the same) and go to about
>> 1201b3000 (varies on each start).
>>
>> Those read faults are followed a bunch of
>> ---
>> dmar: DRHD: handling fault status reg 3
>> dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr 11ffed000
>> DMAR:[fault reason 05] PTE Write access is not set
>> ---
>> Fault addresses 11fef3000-11fff0000 (the last address varies).
> 
> And only with vfio-pci?  Very odd.
> 
> I've since gotten my HD7850 working with vfio-pci,x-vga=on with my Intel
> VT-d system.  Gleb is working on a patch to fix the emulator bug, but it
> can be avoided using the emulate_invalid_guest_state=0 module option to
> kvm_intel.
Okay. This time I used your next branch. kernel command line included
"kvm_intel.emulate_invalid_guest_state=0"

> I see we've already mentioned it in this thread, but I'll re-iterate the
> importance of not loading host drivers for graphics cards until we can
> reset them better.  Another user was having trouble with nvidia cards
> that was cleared by not loading the host nvidia kernel driver.
As far as I know, nothing touches HD7750 before qemu comes in. My kernel
config: http://pastebin.com/sE73CMgH

> I believe you're using the stock upstream kernel, 3.9-rc5 last I saw in
> this thread.  One potential difference between vfio-pci and pci-assign
> is that vfio does not allow access to unarchitected PCI config space.
> That is, regions not covered by capabilities.  This has also been shown
> to be a problem, so it's being fixed, but is not yet upstream.  I plan
> to push this in for 3.10, but it's already in my next branch:
I reran some test with your branch.

pc/q35, vga=none, vfio-pci, x-vga=on. As previously, without 'if (1 ||
data == quirk->data)' in hw/misc/vfio.c qemu freezes early. With the
patch it goes a bit further, corrupts the main screen, and freezes again.

pc, vga=cirrus, vfio-pci, no x-vga.
[  104.469331] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[  104.494873] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270
[  125.895842] vfio-pci 0000:01:00.0: irq 48 for MSI/MSI-X
And then BSOD: Attempt to reset the display driver and recover from
timeout failed.

q35, vga=cirrus, vfio-pci, no x-vga. It's like the previous config, but
with many DMAR faults.
[   73.235513] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[   73.260776] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270
[   95.919689] vfio-pci 0000:01:00.0: irq 48 for MSI/MSI-X
[   97.642747] dmar: DRHD: handling fault status reg 3
[   97.642752] dmar: DMAR:[DMA Read] Request device [01:00.0] fault addr
11fff6000
[   97.642752] DMAR:[fault reason 06] PTE Read access is not set
..
+ BSOD.

q35, vga=cirrus, pci-assign. Windows boots, the GPU doesn't start (code 10).
pc, vga=cirrus, pci-assign. This is the working config.
diff mbox

Patch

--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -1117,7 +1117,7 @@  static uint64_t vfio_ati_3c3_quirk_read(void *opaque,
     uint64_t data = vfio_vga_read(&vdev->vga.region[QEMU_PCI_VGA_IO_HI],
                                   addr + 0x3, size);
 
-    if (data == quirk->data) {
+    if (1 || data == quirk->data) {
         data = pci_get_byte(pdev->config + PCI_BASE_ADDRESS_4 + 1);
         DPRINTF("%s(0x3c3, 1) = 0x%"PRIx64"\n", __func__, data);
     }