diff mbox

Multi GPU passthrough via VFIO

Message ID 1391635674.15608.13.camel@ul30vt.home
State New
Headers show

Commit Message

Alex Williamson Feb. 5, 2014, 9:27 p.m. UTC
On Wed, 2014-02-05 at 22:10 +0100, Maik Broemme wrote:
> Hi Alex,
> 
> Alex Williamson <alex.williamson@redhat.com> wrote:
> > On Wed, 2014-02-05 at 19:59 +0100, Maik Broemme wrote:
> > > Hi,
> > > 
> > > currently VFIO with multi GPU passthrough is working partially and
> > > hopefully somebody has a hint about the problem. I'm doing passthrough
> > > of an AMD Radeon R9 290X and AMD Radeon 7870 GHz Edition to a single VM.
> > > 
> > > If the VM is running Linux this works quite well with radeon or fglrx
> > > driver. Please see 'dmesg' log attached, when using the radeon driver.
> > > If needed I can also post one with fglrx driver.
> > > 
> > > If I do the exact same passthrough to a Windows VM and use latest AMD
> > > Catalyst 14.1 (2/1/2014) or AMD Catalyst 13.12 (12/18/2013) I can get
> > > only the first device working (AMD R9 290X) with 'x-vga=on'. I don't
> > > enable 'x-vga=on' on second device as this should never work. :)
> > 
> > Why not?  The guest is able to change the VGA enable bit in the emulated
> > bridge registers and access VGA space of each device, just like happens
> > on bare metal.  You'll only get one device initialized from seabios, but
> > that's the same as would happen on bare metal as well.
> > 
> 
> Well it was just my guess as it would behave like most physical boxes
> in this case. :)
> 
> > > I see
> > > BIOS boot screen and everything works fine except for the second GPU.
> > > The windows device manager just show me "Code 12" for the second GPU
> > > and its HD Audio device. Code 12 means: "This device cannot find enough
> > > free resources that it can use".
> > 
> > I've seen the same using Nvidia GRID GPUs (w/o x-vga=on), but only with
> > the Q35 chipset model, Linux works, Windows reports Code 12.  I have no
> > idea why as all the PCI resources appear to be properly sized and
> > mapped.  FWIW, 2 GRID GPUs assigned to a guest do work with the 440FX
> > chipset model.  Beyond 2 we run out of MMIO resources below 4G and
> > something bad happens.
> > 
> 
> Interesting. I will try 440FX a bit later and see if this works. What I
> can also do is to post system resource conflicts from Windows, the AMD
> Catalyst Center has it integrated. Maybe this will help?

If you actually see conflicts, then yes.  The Code 12 I've seen I was
never able to identify a conflict.  The trouble with 440FX is that
you'll need to use pci-bridges to isolate VGA space of each GPU.
Otherwise one card would need to be disabled to ensure the VGA accesses
go to the other.

> > > QEMU is called in both cases via the following. I just replace the
> > > '-drive' accordingly.
> > > 
> > > /usr/bin/taskset -c 0,1,2,3 /usr/bin/qemu-system-x86_64 \
> > >   -machine q35,accel=kvm \
> > >   -enable-kvm \
> > >   -nodefaults \
> > >   -nographic \
> > >   -vga none \
> > >   -boot order=nc \
> > >   -cpu host \
> > >   -smp cores=4,threads=1,sockets=1 \
> > >   -m 8192 \
> > >   -rtc base=localtime \
> > >   -k de \
> > >   -drive file=/srv/kvm/linux-drive0.img,id=drive0,if=none,cache=none,aio=threads \
> > >   -mon chardev=monitor0 \
> > >   -chardev socket,id=monitor0,path=/tmp/linux.monitor,nowait,server \
> > >   -netdev tap,id=net0,vhost=on,helper=/usr/lib/qemu/qemu-bridge-helper \
> > >   -device virtio-net-pci,netdev=net0,mac=00:00:00:02:01:04 \
> > >   -device virtio-blk-pci,drive=drive0,ioeventfd=on \
> > >   -device ioh3420,bus=pcie.0,id=pcie0,port=1,chassis=1,multifunction=on \
> > >   -device ioh3420,bus=pcie.0,id=pcie1,port=2,chassis=2,multifunction=on \
> > >   -device vfio-pci,host=01:00.0,addr=00.0,bus=pcie0,multifunction=on,x-vga=on \
> > >   -device vfio-pci,host=01:00.1,addr=00.1,bus=pcie0 \
> > >   -device vfio-pci,host=02:00.0,addr=00.0,bus=pcie1,multifunction=on \
> > >   -device vfio-pci,host=02:00.1,addr=00.1,bus=pcie1 \
> > >   -no-reboot
> > > 
> > > My setup is the following:
> > > 
> > > Kernel: linux-3.13.1
> > > Seabios: seabios-git-rel.1.7.4.r51.g151d034 (5/2/2014)
> > > QEMU: qemu-git-2.0.r30666.g31db5b3 (5/2/2014)
> > > 
> > > Below is the 'lspci' output and I'm using the AMD Radeon HD 5430 as device
> > > for my local X server:
> > > 
> > > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx0 port B) (rev 02)
> > > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory Management Unit (IOMMU)
> > > 00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port B)
> > > 00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port D)
> > > 00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port H)
> > > 00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx1 port B)
> > > 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
> > > 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > 00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > 00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42)
> > > 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
> > > 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
> > > 00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
> > > 00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
> > > 00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
> > > 00:15.1 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1)
> > > 00:15.2 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 2)
> > > 00:15.3 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 3)
> > > 00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > 00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 0
> > > 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 1
> > > 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 2
> > > 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 3
> > > 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 4
> > > 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 5
> > > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
> > > 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
> > > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition]
> > > 02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > 03:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01)
> > > 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430]
> > > 04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300 Series]
> > > 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
> > > 07:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01)
> > > 
> > > Another minor issue is that the R9 290X is not reset during shutdown of
> > > VM (neither Linux nor Windows) but it can be tricked with doing
> > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option
> > > in QEMU. The 7870 is doing the reset properly.
> > 
> > 
> > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by
> > chance?  Thanks,
> > 
> 
> Here are both. It is funny it is opposite as you described. :)


Oops, yes.  Does this help?


I can't figure out why I coded it the way that I did.  Probably overly
targeting a specific device.  Thanks,

Alex

> root@homer:~# lspci -vvv -s 01:00.0 | grep NoSoftRst
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 
> root@homer:~# lspci -vvv -s 02:00.0 | grep NoSoftRst
> 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> 
> root@homer:~# lspci -vvv -s 01:00.0
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> 	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 49
> 	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
> 	Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
> 	Region 4: I/O ports at be00 [size=256]
> 	Region 5: Memory at fdd80000 (32-bit, non-prefetchable) [size=256K]
> 	[virtual] Expansion ROM at d0000000 [disabled] [size=128K]
> 	Capabilities: [48] Vendor Specific Information: Len=08 <?>
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> 		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> 			ClockPM- Surprise- LLActRep- BwNot-
> 		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 		Address: 00000000fee00000  Data: 0000
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [270 v1] #19
> 	Capabilities: [2b0 v1] Address Translation Service (ATS)
> 		ATSCap:	Invalidate Queue Depth: 00
> 		ATSCtl:	Enable+, Smallest Translation Unit: 00
> 	Capabilities: [2c0 v1] #13
> 	Capabilities: [2d0 v1] #1b
> 	Kernel driver in use: vfio-pci
> 	Kernel modules: radeon
> 
> root@homer:~# lspci -vvv -s 02:00.0
> 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition] (prog-if 00 [VGA controller])
> 	Subsystem: XFX Pine Group Inc. Device 3251
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 48
> 	Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M]
> 	Region 2: Memory at fda80000 (64-bit, non-prefetchable) [size=256K]
> 	Region 4: I/O ports at ee00 [size=256]
> 	[virtual] Expansion ROM at fda00000 [disabled] [size=128K]
> 	Capabilities: [48] Vendor Specific Information: Len=08 <?>
> 	Capabilities: [50] Power Management version 3
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> 		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> 			ClockPM- Surprise- LLActRep- BwNot-
> 		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 		Address: 00000000fee00000  Data: 0000
> 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> 	Capabilities: [150 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [270 v1] #19
> 	Capabilities: [2b0 v1] Address Translation Service (ATS)
> 		ATSCap:	Invalidate Queue Depth: 00
> 		ATSCtl:	Enable+, Smallest Translation Unit: 00
> 	Capabilities: [2c0 v1] #13
> 	Capabilities: [2d0 v1] #1b
> 	Kernel driver in use: vfio-pci
> 	Kernel modules: radeon
> 
> > Alex 
> > 
> 
> --Maik

Comments

Maik Broemme Feb. 5, 2014, 11:47 p.m. UTC | #1
Hi Alex,

Alex Williamson <alex.williamson@redhat.com> wrote:
> On Wed, 2014-02-05 at 22:10 +0100, Maik Broemme wrote:
> > Hi Alex,
> > 
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > On Wed, 2014-02-05 at 19:59 +0100, Maik Broemme wrote:
> > > > Hi,
> > > > 
> > > > currently VFIO with multi GPU passthrough is working partially and
> > > > hopefully somebody has a hint about the problem. I'm doing passthrough
> > > > of an AMD Radeon R9 290X and AMD Radeon 7870 GHz Edition to a single VM.
> > > > 
> > > > If the VM is running Linux this works quite well with radeon or fglrx
> > > > driver. Please see 'dmesg' log attached, when using the radeon driver.
> > > > If needed I can also post one with fglrx driver.
> > > > 
> > > > If I do the exact same passthrough to a Windows VM and use latest AMD
> > > > Catalyst 14.1 (2/1/2014) or AMD Catalyst 13.12 (12/18/2013) I can get
> > > > only the first device working (AMD R9 290X) with 'x-vga=on'. I don't
> > > > enable 'x-vga=on' on second device as this should never work. :)
> > > 
> > > Why not?  The guest is able to change the VGA enable bit in the emulated
> > > bridge registers and access VGA space of each device, just like happens
> > > on bare metal.  You'll only get one device initialized from seabios, but
> > > that's the same as would happen on bare metal as well.
> > > 
> > 
> > Well it was just my guess as it would behave like most physical boxes
> > in this case. :)
> > 
> > > > I see
> > > > BIOS boot screen and everything works fine except for the second GPU.
> > > > The windows device manager just show me "Code 12" for the second GPU
> > > > and its HD Audio device. Code 12 means: "This device cannot find enough
> > > > free resources that it can use".
> > > 
> > > I've seen the same using Nvidia GRID GPUs (w/o x-vga=on), but only with
> > > the Q35 chipset model, Linux works, Windows reports Code 12.  I have no
> > > idea why as all the PCI resources appear to be properly sized and
> > > mapped.  FWIW, 2 GRID GPUs assigned to a guest do work with the 440FX
> > > chipset model.  Beyond 2 we run out of MMIO resources below 4G and
> > > something bad happens.
> > > 
> > 
> > Interesting. I will try 440FX a bit later and see if this works. What I
> > can also do is to post system resource conflicts from Windows, the AMD
> > Catalyst Center has it integrated. Maybe this will help?
> 
> If you actually see conflicts, then yes.  The Code 12 I've seen I was
> never able to identify a conflict.  The trouble with 440FX is that
> you'll need to use pci-bridges to isolate VGA space of each GPU.
> Otherwise one card would need to be disabled to ensure the VGA accesses
> go to the other.
> 

Okay I've collected all necessary information (hopefully). Some are in
German but if needed I can translate it. Please find it below:

- Conflicts:

E/A-Port 0x000003C0-0x000003DF	AMD Radeon R9 200 Series
E/A-Port 0x000003C0-0x000003DF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
IRQ 10	AMD Radeon HD 7800 Series
IRQ 10	Intel(R) ICH9 Family SMBus Controller - 2930
	
Speicheradresse 0xFE800000-0xFE83FFFF	AMD Radeon R9 200 Series
Speicheradresse 0xFE800000-0xFE83FFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
Speicheradresse 0xE0000000-0xEFFFFFFF	AMD Radeon HD 7800 Series
Speicheradresse 0xE0000000-0xEFFFFFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
Speicheradresse 0xA0000-0xBFFFF	AMD Radeon R9 200 Series
Speicheradresse 0xA0000-0xBFFFF	PCI-Bus
Speicheradresse 0xA0000-0xBFFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
Speicheradresse 0xC0000000-0xCFFFFFFF	AMD Radeon R9 200 Series
Speicheradresse 0xC0000000-0xCFFFFFFF	PCI-Bus
Speicheradresse 0xC0000000-0xCFFFFFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
E/A-Port 0x000003B0-0x000003BB	AMD Radeon R9 200 Series
E/A-Port 0x000003B0-0x000003BB	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
Speicheradresse 0xFE600000-0xFE63FFFF	AMD Radeon HD 7800 Series
Speicheradresse 0xFE600000-0xFE63FFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
E/A-Port 0x0000C000-0x0000C0FF	AMD Radeon HD 7800 Series
E/A-Port 0x0000C000-0x0000C0FF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420
	
E/A-Port 0x0000D000-0x0000D0FF	AMD Radeon R9 200 Series
E/A-Port 0x0000D000-0x0000D0FF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420

- Display devices:

Name			AMD Radeon R9 200 Series
PNP-Gerätekennung	PCI\VEN_1002&DEV_67B0&SUBSYS_0B001002&REV_00\4&2122111D&0&0018
Adaptertyp		AMD Radeon Graphics Processor (0x67B0), Advanced Micro Devices, Inc.-kompatibel
Adapterbeschreibung	AMD Radeon R9 200 Series
Adapter-RAM		(1.048.576) Bytes
Installierte Treiber	aticfx64.dll,aticfx64.dll,aticfx64.dll,aticfx32,aticfx32,aticfx32,atiumd64.dll,atidxx64.dll,atidxx64.dll,atiumdag,atidxx32,atidxx32,atiumdva,atiumd6a.cap,atitmm64.dll
Treiberversion		13.350.1005.0
INF-Datei		oem7.inf (Abschnitt ati2mtag_Hawaii)
Farbebenen		Nicht verfügbar
Farbtabelleneinträge	4294967296
Auflösung		1920 x 1080 x 60 Hz
Bits/Pixel		32
Speicheradresse		0xC0000000-0xCFFFFFFF
Speicheradresse		0xD0000000-0xD07FFFFF
E/A-Port		0x0000D000-0x0000D0FF
Speicheradresse		0xFE800000-0xFE83FFFF
IRQ-Kanal		IRQ 4294967287
E/A-Port		0x000003B0-0x000003BB
E/A-Port		0x000003C0-0x000003DF
Speicheradresse		0xA0000-0xBFFFF
Treiber			c:\windows\system32\drivers\atikmpag.sys (8.14.1.6367, 622,00 KB (636.928 Bytes), 31.01.2014 20:28)

Name			AMD Radeon HD 7800 Series
PNP-Gerätekennung	PCI\VEN_1002&DEV_6818&SUBSYS_32511682&REV_00\4&49049C7&0&0820
Adaptertyp		Nicht verfügbar, Advanced Micro Devices, Inc.-kompatibel
Adapterbeschreibung	AMD Radeon HD 7800 Series
Adapter-RAM		Nicht verfügbar
Installierte Treiber	aticfx64.dll,aticfx64.dll,aticfx64.dll,aticfx32,aticfx32,aticfx32,atiumd64.dll,atidxx64.dll,atidxx64.dll,atiumdag,atidxx32,atidxx32,atiumdva,atiumd6a.cap,atitmm64.dll
Treiberversion		13.350.1005.0
INF-Datei		oem7.inf (Abschnitt ati2mtag_R575B)
Farbebenen		Nicht verfügbar
Farbtabelleneinträge	Nicht verfügbar
Auflösung		Nicht verfügbar
Bits/Pixel		Nicht verfügbar
Speicheradresse		0xE0000000-0xEFFFFFFF
Speicheradresse		0xFE600000-0xFE63FFFF
E/A-Port		0x0000C000-0x0000C0FF
IRQ-Kanal		IRQ 10
Treiber			c:\windows\system32\drivers\atikmpag.sys (8.14.1.6367, 622,00 KB (636.928 Bytes), 31.01.2014 20:28)

- I/O:

0x00000000-0x00000CD7	PCI-Bus	OK
0x00000060-0x00000060	Standardtastatur (PS/2)	OK
0x00000064-0x00000064	Standardtastatur (PS/2)	OK
0x00000070-0x00000071	System CMOS/Echtzeituhr	OK
0x00000072-0x00000077	System CMOS/Echtzeituhr	OK
0x000002F8-0x000002FF	Kommunikationsanschluss (COM2)	OK
0x00000378-0x0000037F	Druckeranschluss (LPT1)	OK
0x000003B0-0x000003BB	AMD Radeon R9 200 Series	OK
0x000003B0-0x000003BB	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0x000003C0-0x000003DF	AMD Radeon R9 200 Series	OK
0x000003C0-0x000003DF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0x000003F2-0x000003F5	Standard-Diskettenlaufwerkcontroller	OK
0x000003F7-0x000003F7	Standard-Diskettenlaufwerkcontroller	OK
0x000003F8-0x000003FF	Kommunikationsanschluss (COM1)	OK
0x00000CD8-0x00000CF7	ACPI-Modulgerät	OK
0x00000D00-0x0000FFFF	PCI-Bus	OK
0x0000B100-0x0000B13F	Intel(R) ICH9 Family SMBus Controller - 2930	OK
0x0000C000-0x0000C0FF	AMD Radeon HD 7800 Series	OK
0x0000C000-0x0000C0FF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0x0000D000-0x0000D0FF	AMD Radeon R9 200 Series	OK
0x0000D000-0x0000D0FF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0x0000E000-0x0000E03F	Red Hat VirtIO SCSI controller	OK
0x0000E080-0x0000E09F	Red Hat VirtIO Ethernet Adapter	OK
0x0000E0A0-0x0000E0BF	Standard AHCI 1.0 Serieller-ATA-Controller	OK

- Memory:

0xC0000000-0xCFFFFFFF	AMD Radeon R9 200 Series	OK
0xC0000000-0xCFFFFFFF	PCI-Bus	OK
0xC0000000-0xCFFFFFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0xD0000000-0xD07FFFFF	AMD Radeon R9 200 Series	OK
0xFE800000-0xFE83FFFF	AMD Radeon R9 200 Series	OK
0xFE800000-0xFE83FFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0xFEA40000-0xFEA40FFF	Red Hat VirtIO Ethernet Adapter	OK
0xFED00000-0xFED003FF	Hochpräzisionsereigniszeitgeber	OK
0xFEA41000-0xFEA41FFF	Red Hat VirtIO SCSI controller	OK
0xE0000000-0xEFFFFFFF	AMD Radeon HD 7800 Series	OK
0xE0000000-0xEFFFFFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0xFE600000-0xFE63FFFF	AMD Radeon HD 7800 Series	OK
0xFE600000-0xFE63FFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
0xFEA42000-0xFEA42FFF	Standard AHCI 1.0 Serieller-ATA-Controller	OK
0xFE860000-0xFE863FFF	High Definition Audio-Controller	OK
0xA0000-0xBFFFF	AMD Radeon R9 200 Series	OK
0xA0000-0xBFFFF	PCI-Bus	OK
0xA0000-0xBFFFF	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK

- IRQ:

IRQ 1	Standardtastatur (PS/2)	OK
IRQ 3	Kommunikationsanschluss (COM2)	OK
IRQ 4	Kommunikationsanschluss (COM1)	OK
IRQ 6	Standard-Diskettenlaufwerkcontroller	OK
IRQ 8	System CMOS/Echtzeituhr	OK
IRQ 10	AMD Radeon HD 7800 Series	OK
IRQ 10	Intel(R) ICH9 Family SMBus Controller - 2930	OK
IRQ 12	PS/2-kompatible Maus	OK
IRQ 16	Standard AHCI 1.0 Serieller-ATA-Controller	OK
IRQ 20	High Definition Audio-Controller	OK
IRQ 81	Microsoft ACPI-konformes System	OK
IRQ 82	Microsoft ACPI-konformes System	OK
IRQ 83	Microsoft ACPI-konformes System	OK
IRQ 84	Microsoft ACPI-konformes System	OK
IRQ 85	Microsoft ACPI-konformes System	OK
IRQ 86	Microsoft ACPI-konformes System	OK
IRQ 87	Microsoft ACPI-konformes System	OK
IRQ 88	Microsoft ACPI-konformes System	OK
IRQ 89	Microsoft ACPI-konformes System	OK
IRQ 90	Microsoft ACPI-konformes System	OK
IRQ 91	Microsoft ACPI-konformes System	OK
IRQ 92	Microsoft ACPI-konformes System	OK
IRQ 93	Microsoft ACPI-konformes System	OK
IRQ 94	Microsoft ACPI-konformes System	OK
IRQ 95	Microsoft ACPI-konformes System	OK
IRQ 96	Microsoft ACPI-konformes System	OK
IRQ 97	Microsoft ACPI-konformes System	OK
IRQ 98	Microsoft ACPI-konformes System	OK
IRQ 99	Microsoft ACPI-konformes System	OK
IRQ 100	Microsoft ACPI-konformes System	OK
IRQ 101	Microsoft ACPI-konformes System	OK
IRQ 102	Microsoft ACPI-konformes System	OK
IRQ 103	Microsoft ACPI-konformes System	OK
IRQ 104	Microsoft ACPI-konformes System	OK
IRQ 105	Microsoft ACPI-konformes System	OK
IRQ 106	Microsoft ACPI-konformes System	OK
IRQ 107	Microsoft ACPI-konformes System	OK
IRQ 108	Microsoft ACPI-konformes System	OK
IRQ 109	Microsoft ACPI-konformes System	OK
IRQ 110	Microsoft ACPI-konformes System	OK
IRQ 111	Microsoft ACPI-konformes System	OK
IRQ 112	Microsoft ACPI-konformes System	OK
IRQ 113	Microsoft ACPI-konformes System	OK
IRQ 114	Microsoft ACPI-konformes System	OK
IRQ 115	Microsoft ACPI-konformes System	OK
IRQ 116	Microsoft ACPI-konformes System	OK
IRQ 117	Microsoft ACPI-konformes System	OK
IRQ 118	Microsoft ACPI-konformes System	OK
IRQ 119	Microsoft ACPI-konformes System	OK
IRQ 120	Microsoft ACPI-konformes System	OK
IRQ 121	Microsoft ACPI-konformes System	OK
IRQ 122	Microsoft ACPI-konformes System	OK
IRQ 123	Microsoft ACPI-konformes System	OK
IRQ 124	Microsoft ACPI-konformes System	OK
IRQ 125	Microsoft ACPI-konformes System	OK
IRQ 126	Microsoft ACPI-konformes System	OK
IRQ 127	Microsoft ACPI-konformes System	OK
IRQ 128	Microsoft ACPI-konformes System	OK
IRQ 129	Microsoft ACPI-konformes System	OK
IRQ 130	Microsoft ACPI-konformes System	OK
IRQ 131	Microsoft ACPI-konformes System	OK
IRQ 132	Microsoft ACPI-konformes System	OK
IRQ 133	Microsoft ACPI-konformes System	OK
IRQ 134	Microsoft ACPI-konformes System	OK
IRQ 135	Microsoft ACPI-konformes System	OK
IRQ 136	Microsoft ACPI-konformes System	OK
IRQ 137	Microsoft ACPI-konformes System	OK
IRQ 138	Microsoft ACPI-konformes System	OK
IRQ 139	Microsoft ACPI-konformes System	OK
IRQ 140	Microsoft ACPI-konformes System	OK
IRQ 141	Microsoft ACPI-konformes System	OK
IRQ 142	Microsoft ACPI-konformes System	OK
IRQ 143	Microsoft ACPI-konformes System	OK
IRQ 144	Microsoft ACPI-konformes System	OK
IRQ 145	Microsoft ACPI-konformes System	OK
IRQ 146	Microsoft ACPI-konformes System	OK
IRQ 147	Microsoft ACPI-konformes System	OK
IRQ 148	Microsoft ACPI-konformes System	OK
IRQ 149	Microsoft ACPI-konformes System	OK
IRQ 150	Microsoft ACPI-konformes System	OK
IRQ 151	Microsoft ACPI-konformes System	OK
IRQ 152	Microsoft ACPI-konformes System	OK
IRQ 153	Microsoft ACPI-konformes System	OK
IRQ 154	Microsoft ACPI-konformes System	OK
IRQ 155	Microsoft ACPI-konformes System	OK
IRQ 156	Microsoft ACPI-konformes System	OK
IRQ 157	Microsoft ACPI-konformes System	OK
IRQ 158	Microsoft ACPI-konformes System	OK
IRQ 159	Microsoft ACPI-konformes System	OK
IRQ 160	Microsoft ACPI-konformes System	OK
IRQ 161	Microsoft ACPI-konformes System	OK
IRQ 162	Microsoft ACPI-konformes System	OK
IRQ 163	Microsoft ACPI-konformes System	OK
IRQ 164	Microsoft ACPI-konformes System	OK
IRQ 165	Microsoft ACPI-konformes System	OK
IRQ 166	Microsoft ACPI-konformes System	OK
IRQ 167	Microsoft ACPI-konformes System	OK
IRQ 168	Microsoft ACPI-konformes System	OK
IRQ 169	Microsoft ACPI-konformes System	OK
IRQ 170	Microsoft ACPI-konformes System	OK
IRQ 171	Microsoft ACPI-konformes System	OK
IRQ 172	Microsoft ACPI-konformes System	OK
IRQ 173	Microsoft ACPI-konformes System	OK
IRQ 174	Microsoft ACPI-konformes System	OK
IRQ 175	Microsoft ACPI-konformes System	OK
IRQ 176	Microsoft ACPI-konformes System	OK
IRQ 177	Microsoft ACPI-konformes System	OK
IRQ 178	Microsoft ACPI-konformes System	OK
IRQ 179	Microsoft ACPI-konformes System	OK
IRQ 180	Microsoft ACPI-konformes System	OK
IRQ 181	Microsoft ACPI-konformes System	OK
IRQ 182	Microsoft ACPI-konformes System	OK
IRQ 183	Microsoft ACPI-konformes System	OK
IRQ 184	Microsoft ACPI-konformes System	OK
IRQ 185	Microsoft ACPI-konformes System	OK
IRQ 186	Microsoft ACPI-konformes System	OK
IRQ 187	Microsoft ACPI-konformes System	OK
IRQ 188	Microsoft ACPI-konformes System	OK
IRQ 189	Microsoft ACPI-konformes System	OK
IRQ 190	Microsoft ACPI-konformes System	OK
IRQ 4294967287	AMD Radeon R9 200 Series	OK
IRQ 4294967288	Red Hat VirtIO Ethernet Adapter	OK
IRQ 4294967289	Red Hat VirtIO Ethernet Adapter	OK
IRQ 4294967290	Red Hat VirtIO Ethernet Adapter	OK
IRQ 4294967291	Red Hat VirtIO SCSI controller	OK
IRQ 4294967292	Red Hat VirtIO SCSI controller	OK
IRQ 4294967293	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK
IRQ 4294967294	Intel(R) 5520/5500/X58 I/O Hub PCI Express Root Port 0 - 3420	OK

I'm no expert but it looks like Windows never enabled MSI for the second
card as qemu shell with 'info pci' show me both with IRQ 10. I'll hope it
helps.

> > > > QEMU is called in both cases via the following. I just replace the
> > > > '-drive' accordingly.
> > > > 
> > > > /usr/bin/taskset -c 0,1,2,3 /usr/bin/qemu-system-x86_64 \
> > > >   -machine q35,accel=kvm \
> > > >   -enable-kvm \
> > > >   -nodefaults \
> > > >   -nographic \
> > > >   -vga none \
> > > >   -boot order=nc \
> > > >   -cpu host \
> > > >   -smp cores=4,threads=1,sockets=1 \
> > > >   -m 8192 \
> > > >   -rtc base=localtime \
> > > >   -k de \
> > > >   -drive file=/srv/kvm/linux-drive0.img,id=drive0,if=none,cache=none,aio=threads \
> > > >   -mon chardev=monitor0 \
> > > >   -chardev socket,id=monitor0,path=/tmp/linux.monitor,nowait,server \
> > > >   -netdev tap,id=net0,vhost=on,helper=/usr/lib/qemu/qemu-bridge-helper \
> > > >   -device virtio-net-pci,netdev=net0,mac=00:00:00:02:01:04 \
> > > >   -device virtio-blk-pci,drive=drive0,ioeventfd=on \
> > > >   -device ioh3420,bus=pcie.0,id=pcie0,port=1,chassis=1,multifunction=on \
> > > >   -device ioh3420,bus=pcie.0,id=pcie1,port=2,chassis=2,multifunction=on \
> > > >   -device vfio-pci,host=01:00.0,addr=00.0,bus=pcie0,multifunction=on,x-vga=on \
> > > >   -device vfio-pci,host=01:00.1,addr=00.1,bus=pcie0 \
> > > >   -device vfio-pci,host=02:00.0,addr=00.0,bus=pcie1,multifunction=on \
> > > >   -device vfio-pci,host=02:00.1,addr=00.1,bus=pcie1 \
> > > >   -no-reboot
> > > > 
> > > > My setup is the following:
> > > > 
> > > > Kernel: linux-3.13.1
> > > > Seabios: seabios-git-rel.1.7.4.r51.g151d034 (5/2/2014)
> > > > QEMU: qemu-git-2.0.r30666.g31db5b3 (5/2/2014)
> > > > 
> > > > Below is the 'lspci' output and I'm using the AMD Radeon HD 5430 as device
> > > > for my local X server:
> > > > 
> > > > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx0 port B) (rev 02)
> > > > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory Management Unit (IOMMU)
> > > > 00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port B)
> > > > 00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port D)
> > > > 00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port H)
> > > > 00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx1 port B)
> > > > 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
> > > > 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > > 00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > > 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > > 00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > > 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42)
> > > > 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
> > > > 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
> > > > 00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
> > > > 00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
> > > > 00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
> > > > 00:15.1 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1)
> > > > 00:15.2 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 2)
> > > > 00:15.3 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB900 PCI to PCI bridge (PCIE port 3)
> > > > 00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > > 00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > > 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 0
> > > > 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 1
> > > > 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 2
> > > > 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 3
> > > > 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 4
> > > > 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h Processor Function 5
> > > > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
> > > > 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aac8
> > > > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition]
> > > > 02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
> > > > 03:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01)
> > > > 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430]
> > > > 04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300 Series]
> > > > 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
> > > > 07:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01)
> > > > 
> > > > Another minor issue is that the R9 290X is not reset during shutdown of
> > > > VM (neither Linux nor Windows) but it can be tricked with doing
> > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option
> > > > in QEMU. The 7870 is doing the reset properly.
> > > 
> > > 
> > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by
> > > chance?  Thanks,
> > > 
> > 
> > Here are both. It is funny it is opposite as you described. :)
> 
> 
> Oops, yes.  Does this help?
> 
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque)
>  
>      QLIST_FOREACH(group, &group_list, next) {
>          QLIST_FOREACH(vdev, &group->device_list, next) {
> -            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> +            if (!vdev->reset_works || !vdev->has_flr) {
>                  vdev->needs_reset = true;
>              }
>          }
> 
> I can't figure out why I coded it the way that I did.  Probably overly
> targeting a specific device.  Thanks,
> 

This patch works absolutely fine. After applying it to my 'qemu-git', the
device resets works flawlessly. So it would be great to push it upstream
as it looks good.

> Alex
> 
> > root@homer:~# lspci -vvv -s 01:00.0 | grep NoSoftRst
> > 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > 
> > root@homer:~# lspci -vvv -s 02:00.0 | grep NoSoftRst
> > 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> > 
> > root@homer:~# lspci -vvv -s 01:00.0
> > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
> > 	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > 	Latency: 0, Cache Line Size: 64 bytes
> > 	Interrupt: pin A routed to IRQ 49
> > 	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
> > 	Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
> > 	Region 4: I/O ports at be00 [size=256]
> > 	Region 5: Memory at fdd80000 (32-bit, non-prefetchable) [size=256K]
> > 	[virtual] Expansion ROM at d0000000 [disabled] [size=128K]
> > 	Capabilities: [48] Vendor Specific Information: Len=08 <?>
> > 	Capabilities: [50] Power Management version 3
> > 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> > 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > 	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
> > 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> > 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> > 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> > 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> > 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> > 		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> > 			ClockPM- Surprise- LLActRep- BwNot-
> > 		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > 		LnkSta:	Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
> > 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> > 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > 			 Compliance De-emphasis: -6dB
> > 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
> > 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > 		Address: 00000000fee00000  Data: 0000
> > 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> > 	Capabilities: [150 v2] Advanced Error Reporting
> > 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> > 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> > 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> > 	Capabilities: [270 v1] #19
> > 	Capabilities: [2b0 v1] Address Translation Service (ATS)
> > 		ATSCap:	Invalidate Queue Depth: 00
> > 		ATSCtl:	Enable+, Smallest Translation Unit: 00
> > 	Capabilities: [2c0 v1] #13
> > 	Capabilities: [2d0 v1] #1b
> > 	Kernel driver in use: vfio-pci
> > 	Kernel modules: radeon
> > 
> > root@homer:~# lspci -vvv -s 02:00.0
> > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn XT [Radeon HD 7870 GHz Edition] (prog-if 00 [VGA controller])
> > 	Subsystem: XFX Pine Group Inc. Device 3251
> > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > 	Latency: 0, Cache Line Size: 64 bytes
> > 	Interrupt: pin A routed to IRQ 48
> > 	Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M]
> > 	Region 2: Memory at fda80000 (64-bit, non-prefetchable) [size=256K]
> > 	Region 4: I/O ports at ee00 [size=256]
> > 	[virtual] Expansion ROM at fda00000 [disabled] [size=128K]
> > 	Capabilities: [48] Vendor Specific Information: Len=08 <?>
> > 	Capabilities: [50] Power Management version 3
> > 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
> > 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> > 	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
> > 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
> > 			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> > 			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> > 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> > 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> > 		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> > 			ClockPM- Surprise- LLActRep- BwNot-
> > 		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > 		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > 		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
> > 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> > 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > 			 Compliance De-emphasis: -6dB
> > 		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
> > 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > 	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > 		Address: 00000000fee00000  Data: 0000
> > 	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> > 	Capabilities: [150 v2] Advanced Error Reporting
> > 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> > 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> > 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> > 	Capabilities: [270 v1] #19
> > 	Capabilities: [2b0 v1] Address Translation Service (ATS)
> > 		ATSCap:	Invalidate Queue Depth: 00
> > 		ATSCtl:	Enable+, Smallest Translation Unit: 00
> > 	Capabilities: [2c0 v1] #13
> > 	Capabilities: [2d0 v1] #1b
> > 	Kernel driver in use: vfio-pci
> > 	Kernel modules: radeon
> > 
> > > Alex 
> > > 
> > 
> > --Maik
> 
> 
> 

--Maik
Maik Broemme Feb. 6, 2014, 12:25 a.m. UTC | #2
Hi Alex,

Maik Broemme <mbroemme@parallels.com> wrote:
> > > > > Another minor issue is that the R9 290X is not reset during shutdown of
> > > > > VM (neither Linux nor Windows) but it can be tricked with doing
> > > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option
> > > > > in QEMU. The 7870 is doing the reset properly.
> > > > 
> > > > 
> > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by
> > > > chance?  Thanks,
> > > > 
> > > 
> > > Here are both. It is funny it is opposite as you described. :)
> > 
> > 
> > Oops, yes.  Does this help?
> > 
> > --- a/hw/misc/vfio.c
> > +++ b/hw/misc/vfio.c
> > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque)
> >  
> >      QLIST_FOREACH(group, &group_list, next) {
> >          QLIST_FOREACH(vdev, &group->device_list, next) {
> > -            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> > +            if (!vdev->reset_works || !vdev->has_flr) {
> >                  vdev->needs_reset = true;
> >              }
> >          }
> > 
> > I can't figure out why I coded it the way that I did.  Probably overly
> > targeting a specific device.  Thanks,
> > 
> 
> This patch works absolutely fine. After applying it to my 'qemu-git', the
> device resets works flawlessly. So it would be great to push it upstream
> as it looks good.
> 

Okay sorry. I was too fast here. It was just working first time but now
even after clean reboot it no longer works as expected but behavior
is very strange.

Windows:

  1st boot works fine - boot VGA and Windows ATI driver loaded, issue
      reboot and qemu stopped due to '-no-reboot'.

  2nd boot works partially - boot VGA and Windows ATI driver loaded but
      black screen and my system becames terrible slow and mostly
      unresponsive. My dmesg shows immediately after ATI driver will
      enable the device the following:

[  159.984324] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270
[  159.984340] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x1b@0x2d0
[  160.129036] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270
[  160.129049] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0
[  172.977677] kvm: zapping shadow pages for mmio generation wraparound
[  173.160174] br0: port 2(tap0) entered forwarding state
[  175.902967] vfio-pci 0000:01:00.0: irq 46 for MSI/MSI-X
[  188.340430] Clocksource tsc unstable (delta = -119654611 ns)
[  188.340511] Switched to clocksource hpet
[  191.088693] hpet1: lost 12 rtc interrupts
[  191.926555] hpet1: lost 25 rtc interrupts

  So your patch fixed indeed reset issue of boot VGA but something else
  is wrong now. :)

Linux (fglrx):

  1st boot works fine - boot VGA, fglrx loads fine and X could be
      started, issue reboot via SSH and qemu stopped due to
      '-no-reboot'.

  2nd boot works partially - boot VGA, fglrx loads fine but X couldn't
      be started and fails with:

[   34.265111] fglrx_pci 0000:02:00.0: irq 50 for MSI/MSI-X
[   34.344313] <6>[fglrx] Firegl kernel thread PID: 318
[   34.344400] <6>[fglrx] Firegl kernel thread PID: 319
[   34.344478] <6>[fglrx] Firegl kernel thread PID: 320
[   34.344589] <6>[fglrx] IRQ 50 Enabled
[   34.356105] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
[   34.356107] <6>[fglrx] Reserved FB block: Unshared offset:fac3000, size:3000 
[   34.356109] <6>[fglrx] Reserved FB block: Unshared offset:fac6000, size:23a000 
[   34.356110] <6>[fglrx] Reserved FB block: Unshared offset:7fff4000, size:c000 
[   34.386436] fglrx_pci 0000:01:00.0: irq 51 for MSI/MSI-X
[   34.490902] <6>[fglrx] Firegl kernel thread PID: 321
[   34.490994] <6>[fglrx] Firegl kernel thread PID: 322
[   34.491069] <6>[fglrx] Firegl kernel thread PID: 323
[   34.491166] <6>[fglrx] IRQ 51 Enabled
[   34.505271] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
[   34.505273] <6>[fglrx] Reserved FB block: Unshared offset:f9c3000, size:3000 
[   34.505274] <6>[fglrx] Reserved FB block: Unshared offset:f9c6000, size:23a000 
[   34.505276] <6>[fglrx] Reserved FB block: Unshared offset:fc00000, size:100000 
[   34.505277] <6>[fglrx] Reserved FB block: Unshared offset:fff8000, size:8000 
[   34.505278] <6>[fglrx] Reserved FB block: Unshared offset:ffff4000, size:c000 
[   34.526198] BUG: unable to handle kernel paging request at ffff880c724e8008
[   34.526203] IP: [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
[   34.526277] PGD 1b3e067 PUD 0 
[   34.526279] Oops: 0002 [#1] PREEMPT SMP 
[   34.526282] Modules linked in: mousedev crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel microcode snd_hda_codec serio_raw psmouse parport_pc snd_hwdep snd_pcm parport snd_page_alloc processor snd_timer snd soundcore i2c_i801 intel_agp lpc_ich pcspkr intel_gtt i2c_core shpchp evdev fglrx(PO) amd_iommu_v2 button ext4 crc16 mbcache jbd2 atkbd libps2 virtio_blk virtio_net ahci libahci libata scsi_mod i8042 floppy serio virtio_pci virtio_ring virtio
[   34.526307] CPU: 1 PID: 316 Comm: X Tainted: P           O 3.13.1-2-ARCH #1
[   34.526309] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
[   34.526311] task: ffff8800776e2d00 ti: ffff880037a28000 task.ti: ffff880037a28000
[   34.526312] RIP: 0010:[<ffffffffa0399af6>]  [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
[   34.526353] RSP: 0018:ffff880037a29810  EFLAGS: 00010296
[   34.526354] RAX: 0000000000000001 RBX: ffff8800724e800c RCX: 0000000000000006
[   34.526356] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff8800724e8264
[   34.526357] RBP: ffff88007b19a00c R08: 00000000000186a0 R09: 000000000001e848
[   34.526358] R10: 00000002fffffffd R11: 00000000ffffffff R12: 0000000000000001
[   34.526359] R13: ffff88007b19a00c R14: 0000000000000000 R15: ffff880037a298b0
[   34.526363] FS:  00007f0ba649b880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[   34.526365] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   34.526366] CR2: ffff880c724e8008 CR3: 0000000037998000 CR4: 00000000000406e0
[   34.526372] Stack:
[   34.526373]  ffff88007b19a2f4 ffff88007bffcd1c 0000000000000001 ffffffffa0322cf0
[   34.526375]  0000000000000000 0000000000000000 0000000000000000 ffff880077ed2c08
[   34.526378]  0000000000000000 ffff880077ed2c08 ffff880037a298a0 ffffffffa0327f14
[   34.526380] Call Trace:
[   34.526435]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
[   34.526490]  [<ffffffffa0327f14>] ? PECI_NotifyDALPreAdapterClockChange+0x144/0x160 [fglrx]
[   34.526546]  [<ffffffffa031e321>] ? PHM_SetPowerState+0x31/0xc0 [fglrx]
[   34.526597]  [<ffffffffa0340a5b>] ? PSM_ApplyHardwareAttributes_Dynamic+0x9b/0xf0 [fglrx]
[   34.526651]  [<ffffffffa033fde9>] ? PSM_AdjustPowerState_Dynamic+0x169/0x540 [fglrx]
[   34.526668]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
[   34.526668]  [<ffffffffa0342ee4>] ? PEM_ExcuteEventChain+0x64/0xe0 [fglrx]
[   34.526668]  [<ffffffffa0341302>] ? PEM_HandleEvent+0x92/0xd0 [fglrx]
[   34.526668]  [<ffffffffa03357c0>] ? PEM_CWDDEPM_NotifyEvent+0xe0/0x4d0 [fglrx]
[   34.526668]  [<ffffffffa0333869>] ? PP_Cwdde+0x109/0x180 [fglrx]
[   34.526668]  [<ffffffffa02091dc>] ? firegl_pplib_cwddepm+0xbc/0x130 [fglrx]
[   34.526668]  [<ffffffffa02092d9>] ? firegl_pplib_notify_event+0x89/0xd0 [fglrx]
[   34.526668]  [<ffffffffa020292f>] ? hal_init_gpu+0x2bf/0x480 [fglrx]
[   34.526668]  [<ffffffffa01dcc7b>] ? firegl_open+0x2db/0x310 [fglrx]
[   34.526668]  [<ffffffffa01cb287>] ? ip_firegl_open+0x17/0x20 [fglrx]
[   34.526668]  [<ffffffffa01ccac8>] ? firegl_stub_open+0x98/0x100 [fglrx]
[   34.526668]  [<ffffffff811a82bf>] ? chrdev_open+0x9f/0x1d0
[   34.526668]  [<ffffffff811a1967>] ? do_dentry_open+0x1b7/0x2c0
[   34.526668]  [<ffffffff811aed41>] ? __inode_permission+0x41/0xb0
[   34.526668]  [<ffffffff811a8220>] ? cdev_put+0x30/0x30
[   34.526668]  [<ffffffff811a1d91>] ? finish_open+0x31/0x40
[   34.526668]  [<ffffffff811b1b72>] ? do_last+0x572/0xe90
[   34.526668]  [<ffffffff811af036>] ? link_path_walk+0x236/0x8d0
[   34.526668]  [<ffffffff811b254b>] ? path_openat+0xbb/0x6b0
[   34.526668]  [<ffffffff811b3c6a>] ? do_filp_open+0x3a/0x90
[   34.526668]  [<ffffffff811c0567>] ? __alloc_fd+0xa7/0x130
[   34.526668]  [<ffffffff811a2f49>] ? do_sys_open+0x129/0x220
[   34.526668]  [<ffffffff811a305e>] ? SyS_open+0x1e/0x20
[   34.526668]  [<ffffffff8152136d>] ? system_call_fastpath+0x1a/0x1f
[   34.526668] Code: 8b 4a 1c 8b 93 e0 18 00 00 48 8d bb 58 02 00 00 85 d2 0f 84 63 02 00 00 f6 c2 01 0f 84 20 01 00 00 44 8b 1b 41 ff cb 4f 8d 14 5b <46> 89 44 93 08 8b 95 3c 02 00 00 48 89 d0 48 c1 e8 07 a8 01 75 
[   34.526668] RIP  [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
[   34.526668]  RSP <ffff880037a29810>
[   34.526668] CR2: ffff880c724e8008
[   34.526668] ---[ end trace 5431e6dcf1c31dea ]---
[   69.317528] type=1006 audit(1391649552.046:4): pid=324 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=3 res=1

I know it is the binary driver but I would also retry with radeon one but
I believe there will be a similar crash. In my first try I just rebooted
the Linux VM several times without starting X.

I got it one time working without getting 'Clocksource tsc unstable' but
now I'm unable to repeat it. So I believe something more is needed.

> > Alex
> > 
> 
> --Maik
> 

--Maik
Alex Williamson Feb. 6, 2014, 3:36 a.m. UTC | #3
On Thu, 2014-02-06 at 01:25 +0100, Maik Broemme wrote:
> Hi Alex,
> 
> Maik Broemme <mbroemme@parallels.com> wrote:
> > > > > > Another minor issue is that the R9 290X is not reset during shutdown of
> > > > > > VM (neither Linux nor Windows) but it can be tricked with doing
> > > > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option
> > > > > > in QEMU. The 7870 is doing the reset properly.
> > > > > 
> > > > > 
> > > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by
> > > > > chance?  Thanks,
> > > > > 
> > > > 
> > > > Here are both. It is funny it is opposite as you described. :)
> > > 
> > > 
> > > Oops, yes.  Does this help?
> > > 
> > > --- a/hw/misc/vfio.c
> > > +++ b/hw/misc/vfio.c
> > > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque)
> > >  
> > >      QLIST_FOREACH(group, &group_list, next) {
> > >          QLIST_FOREACH(vdev, &group->device_list, next) {
> > > -            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> > > +            if (!vdev->reset_works || !vdev->has_flr) {
> > >                  vdev->needs_reset = true;
> > >              }
> > >          }
> > > 
> > > I can't figure out why I coded it the way that I did.  Probably overly
> > > targeting a specific device.  Thanks,
> > > 
> > 
> > This patch works absolutely fine. After applying it to my 'qemu-git', the
> > device resets works flawlessly. So it would be great to push it upstream
> > as it looks good.
> > 
> 
> Okay sorry. I was too fast here. It was just working first time but now
> even after clean reboot it no longer works as expected but behavior
> is very strange.
> 
> Windows:
> 
>   1st boot works fine - boot VGA and Windows ATI driver loaded, issue
>       reboot and qemu stopped due to '-no-reboot'.
> 
>   2nd boot works partially - boot VGA and Windows ATI driver loaded but
>       black screen and my system becames terrible slow and mostly
>       unresponsive. My dmesg shows immediately after ATI driver will
>       enable the device the following:
> 
> [  159.984324] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270
> [  159.984340] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x1b@0x2d0
> [  160.129036] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270
> [  160.129049] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0
> [  172.977677] kvm: zapping shadow pages for mmio generation wraparound
> [  173.160174] br0: port 2(tap0) entered forwarding state
> [  175.902967] vfio-pci 0000:01:00.0: irq 46 for MSI/MSI-X
> [  188.340430] Clocksource tsc unstable (delta = -119654611 ns)
> [  188.340511] Switched to clocksource hpet
> [  191.088693] hpet1: lost 12 rtc interrupts
> [  191.926555] hpet1: lost 25 rtc interrupts
> 
>   So your patch fixed indeed reset issue of boot VGA but something else
>   is wrong now. :)

Can you try the cards separately?  If you run lspci on the device in the
host, does it report as normal?  Often when the host gets slow and we
get these sorts of clock issues it means the bus is fatal and we get
timeouts trying to read from it.

> Linux (fglrx):
> 
>   1st boot works fine - boot VGA, fglrx loads fine and X could be
>       started, issue reboot via SSH and qemu stopped due to
>       '-no-reboot'.
> 
>   2nd boot works partially - boot VGA, fglrx loads fine but X couldn't
>       be started and fails with:
> 
> [   34.265111] fglrx_pci 0000:02:00.0: irq 50 for MSI/MSI-X
> [   34.344313] <6>[fglrx] Firegl kernel thread PID: 318
> [   34.344400] <6>[fglrx] Firegl kernel thread PID: 319
> [   34.344478] <6>[fglrx] Firegl kernel thread PID: 320
> [   34.344589] <6>[fglrx] IRQ 50 Enabled
> [   34.356105] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
> [   34.356107] <6>[fglrx] Reserved FB block: Unshared offset:fac3000, size:3000 
> [   34.356109] <6>[fglrx] Reserved FB block: Unshared offset:fac6000, size:23a000 
> [   34.356110] <6>[fglrx] Reserved FB block: Unshared offset:7fff4000, size:c000 
> [   34.386436] fglrx_pci 0000:01:00.0: irq 51 for MSI/MSI-X
> [   34.490902] <6>[fglrx] Firegl kernel thread PID: 321
> [   34.490994] <6>[fglrx] Firegl kernel thread PID: 322
> [   34.491069] <6>[fglrx] Firegl kernel thread PID: 323
> [   34.491166] <6>[fglrx] IRQ 51 Enabled
> [   34.505271] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
> [   34.505273] <6>[fglrx] Reserved FB block: Unshared offset:f9c3000, size:3000 
> [   34.505274] <6>[fglrx] Reserved FB block: Unshared offset:f9c6000, size:23a000 
> [   34.505276] <6>[fglrx] Reserved FB block: Unshared offset:fc00000, size:100000 
> [   34.505277] <6>[fglrx] Reserved FB block: Unshared offset:fff8000, size:8000 
> [   34.505278] <6>[fglrx] Reserved FB block: Unshared offset:ffff4000, size:c000 
> [   34.526198] BUG: unable to handle kernel paging request at ffff880c724e8008
> [   34.526203] IP: [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> [   34.526277] PGD 1b3e067 PUD 0 
> [   34.526279] Oops: 0002 [#1] PREEMPT SMP 
> [   34.526282] Modules linked in: mousedev crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel microcode snd_hda_codec serio_raw psmouse parport_pc snd_hwdep snd_pcm parport snd_page_alloc processor snd_timer snd soundcore i2c_i801 intel_agp lpc_ich pcspkr intel_gtt i2c_core shpchp evdev fglrx(PO) amd_iommu_v2 button ext4 crc16 mbcache jbd2 atkbd libps2 virtio_blk virtio_net ahci libahci libata scsi_mod i8042 floppy serio virtio_pci virtio_ring virtio
> [   34.526307] CPU: 1 PID: 316 Comm: X Tainted: P           O 3.13.1-2-ARCH #1
> [   34.526309] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
> [   34.526311] task: ffff8800776e2d00 ti: ffff880037a28000 task.ti: ffff880037a28000
> [   34.526312] RIP: 0010:[<ffffffffa0399af6>]  [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> [   34.526353] RSP: 0018:ffff880037a29810  EFLAGS: 00010296
> [   34.526354] RAX: 0000000000000001 RBX: ffff8800724e800c RCX: 0000000000000006
> [   34.526356] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff8800724e8264
> [   34.526357] RBP: ffff88007b19a00c R08: 00000000000186a0 R09: 000000000001e848
> [   34.526358] R10: 00000002fffffffd R11: 00000000ffffffff R12: 0000000000000001
> [   34.526359] R13: ffff88007b19a00c R14: 0000000000000000 R15: ffff880037a298b0
> [   34.526363] FS:  00007f0ba649b880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
> [   34.526365] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   34.526366] CR2: ffff880c724e8008 CR3: 0000000037998000 CR4: 00000000000406e0
> [   34.526372] Stack:
> [   34.526373]  ffff88007b19a2f4 ffff88007bffcd1c 0000000000000001 ffffffffa0322cf0
> [   34.526375]  0000000000000000 0000000000000000 0000000000000000 ffff880077ed2c08
> [   34.526378]  0000000000000000 ffff880077ed2c08 ffff880037a298a0 ffffffffa0327f14
> [   34.526380] Call Trace:
> [   34.526435]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
> [   34.526490]  [<ffffffffa0327f14>] ? PECI_NotifyDALPreAdapterClockChange+0x144/0x160 [fglrx]
> [   34.526546]  [<ffffffffa031e321>] ? PHM_SetPowerState+0x31/0xc0 [fglrx]
> [   34.526597]  [<ffffffffa0340a5b>] ? PSM_ApplyHardwareAttributes_Dynamic+0x9b/0xf0 [fglrx]
> [   34.526651]  [<ffffffffa033fde9>] ? PSM_AdjustPowerState_Dynamic+0x169/0x540 [fglrx]
> [   34.526668]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
> [   34.526668]  [<ffffffffa0342ee4>] ? PEM_ExcuteEventChain+0x64/0xe0 [fglrx]
> [   34.526668]  [<ffffffffa0341302>] ? PEM_HandleEvent+0x92/0xd0 [fglrx]
> [   34.526668]  [<ffffffffa03357c0>] ? PEM_CWDDEPM_NotifyEvent+0xe0/0x4d0 [fglrx]
> [   34.526668]  [<ffffffffa0333869>] ? PP_Cwdde+0x109/0x180 [fglrx]
> [   34.526668]  [<ffffffffa02091dc>] ? firegl_pplib_cwddepm+0xbc/0x130 [fglrx]
> [   34.526668]  [<ffffffffa02092d9>] ? firegl_pplib_notify_event+0x89/0xd0 [fglrx]
> [   34.526668]  [<ffffffffa020292f>] ? hal_init_gpu+0x2bf/0x480 [fglrx]
> [   34.526668]  [<ffffffffa01dcc7b>] ? firegl_open+0x2db/0x310 [fglrx]
> [   34.526668]  [<ffffffffa01cb287>] ? ip_firegl_open+0x17/0x20 [fglrx]
> [   34.526668]  [<ffffffffa01ccac8>] ? firegl_stub_open+0x98/0x100 [fglrx]
> [   34.526668]  [<ffffffff811a82bf>] ? chrdev_open+0x9f/0x1d0
> [   34.526668]  [<ffffffff811a1967>] ? do_dentry_open+0x1b7/0x2c0
> [   34.526668]  [<ffffffff811aed41>] ? __inode_permission+0x41/0xb0
> [   34.526668]  [<ffffffff811a8220>] ? cdev_put+0x30/0x30
> [   34.526668]  [<ffffffff811a1d91>] ? finish_open+0x31/0x40
> [   34.526668]  [<ffffffff811b1b72>] ? do_last+0x572/0xe90
> [   34.526668]  [<ffffffff811af036>] ? link_path_walk+0x236/0x8d0
> [   34.526668]  [<ffffffff811b254b>] ? path_openat+0xbb/0x6b0
> [   34.526668]  [<ffffffff811b3c6a>] ? do_filp_open+0x3a/0x90
> [   34.526668]  [<ffffffff811c0567>] ? __alloc_fd+0xa7/0x130
> [   34.526668]  [<ffffffff811a2f49>] ? do_sys_open+0x129/0x220
> [   34.526668]  [<ffffffff811a305e>] ? SyS_open+0x1e/0x20
> [   34.526668]  [<ffffffff8152136d>] ? system_call_fastpath+0x1a/0x1f
> [   34.526668] Code: 8b 4a 1c 8b 93 e0 18 00 00 48 8d bb 58 02 00 00 85 d2 0f 84 63 02 00 00 f6 c2 01 0f 84 20 01 00 00 44 8b 1b 41 ff cb 4f 8d 14 5b <46> 89 44 93 08 8b 95 3c 02 00 00 48 89 d0 48 c1 e8 07 a8 01 75 
> [   34.526668] RIP  [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> [   34.526668]  RSP <ffff880037a29810>
> [   34.526668] CR2: ffff880c724e8008
> [   34.526668] ---[ end trace 5431e6dcf1c31dea ]---
> [   69.317528] type=1006 audit(1391649552.046:4): pid=324 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=3 res=1
> 
> I know it is the binary driver but I would also retry with radeon one but
> I believe there will be a similar crash. In my first try I just rebooted
> the Linux VM several times without starting X.
> 
> I got it one time working without getting 'Clocksource tsc unstable' but
> now I'm unable to repeat it. So I believe something more is needed.

Bus resets are a mixed blessing, it returns the card to a relatively
known state, but it's a fairly unusual event from a platform perspective
and we have no idea what kind of quirks the host system bios might have
in place to workaround hardware.  If the bus is not fatal you might try
running lspci -vvv in the host at various points to see what changed.
For instance, boot a Linux guest to text mode and see if the card is in
the same state between first boot and second boot before starting X.
Thanks,

Alex
diff mbox

Patch

--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -3136,7 +3136,7 @@  static void vfio_pci_reset_handler(void *opaque)
 
     QLIST_FOREACH(group, &group_list, next) {
         QLIST_FOREACH(vdev, &group->device_list, next) {
-            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
+            if (!vdev->reset_works || !vdev->has_flr) {
                 vdev->needs_reset = true;
             }
         }