Patchwork pci: add standard bridge device

login
register
mail settings
Submitter Wen Congyang
Date Aug. 18, 2011, 3:22 a.m.
Message ID <4E4C8577.5000608@cn.fujitsu.com>
Download mbox | patch
Permalink /patch/110494/
State New
Headers show

Comments

Wen Congyang - Aug. 18, 2011, 3:22 a.m.
At 08/17/2011 04:37 PM, Wen Congyang Write:
> At 07/04/2011 05:43 PM, Michael S. Tsirkin Write:
>> This adds support for a standard pci to pci bridge,
>> enabling support for more than 32 PCI devices in the system.
>> To use, specify the device id as a 'bus' option.
>> Example:
>> 	-device pci-bridge,id=bridge1 \
>> 	-netdev user,id=u \
>> 	-device ne2k_pci,id=net2,bus=bridge1,netdev=u
>>
>> TODO: device hotplug support.
> 
> I try this patch, and found that when I use pci bridge, qemu will core dump.
> 
> Here is my command line:
> /usr/local2/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 512 -name vm1 -drive file=/var/lib/libvirt/images/vm1.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=writethrough -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc 0.0.0.0:1 -device pci-bridge,id=bridge1,bus=pci.0,addr=0x08.0x0 -netdev user,id=u -device ne2k_pci,id=net2,bus=bridge1,netdev=u
> 
> Here is the backtrace:
> Core was generated by `/usr/local2/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 512 -name vm1 -dri'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000438e34 in memory_region_add_subregion_common (mr=0x0, offset=49152, subregion=0x1de5d58) at /home/wency/source/qemu/memory.c:1152
> 1152	    QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
> Missing separate debuginfos, use: debuginfo-install SDL-1.2.14-2.el6.x86_64 celt051-0.5.1.3-0.el6.x86_64 cyrus-sasl-gssapi-2.1.23-8.el6.x86_64 cyrus-sasl-lib-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64 cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64 glib2-2.22.5-6.el6.x86_64 glibc-2.12-1.25.el6.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6.x86_64 libX11-1.3-2.el6.x86_64 libXau-1.0.5-1.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libattr-2.4.44-4.el6.x86_64 libcom_err-1.41.12-7.el6.x86_64 libcurl-7.19.7-26.el6.x86_64 libgcrypt-1.4.5-5.el6.x86_64 libgpg-error-1.7-3.el6.x86_64 libidn-1.18-2.el6.x86_64 libjpeg-6b-46.el6.x86_64 libpng-1.2.44-1.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libssh2-1.2.2-7.el6.x86_64 libtasn1-2.3-3.el6.x86_64 libuuid-2.17.2-12.el6.x86_64 libxcb-1.5-1.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 nspr-4.8.7-1.el6.x86_64 nss-3.12.9-9.el6.x86_64 nss-softokn-freebl-3.12.9-3.el6.x86_64 nss-util-3.12.9-1.el6.x86_64 openld

ap
> -2.4.23-15.el6.x86_64 openssl-1.0.0-10.el6.x86_64 pixman-0.18.4-1.el6_0.1.x86_64 spice-server-0.8.0-1.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x0000000000438e34 in memory_region_add_subregion_common (mr=0x0, offset=49152, subregion=0x1de5d58) at /home/wency/source/qemu/memory.c:1152
> #1  0x0000000000439090 in memory_region_add_subregion_overlap (mr=0x0, offset=49152, subregion=0x1de5d58, priority=1) at /home/wency/source/qemu/memory.c:1194
> #2  0x00000000005c55fe in pci_update_mappings (d=0x1de5900) at /home/wency/source/qemu/hw/pci.c:1063
> #3  0x00000000005c5982 in pci_default_write_config (d=0x1de5900, addr=4, val=0, l=2) at /home/wency/source/qemu/hw/pci.c:1121
> #4  0x00000000005cbfbf in pci_host_config_write_common (pci_dev=0x1de5900, addr=4, limit=256, val=1, len=2) at /home/wency/source/qemu/hw/pci_host.c:54
> #5  0x00000000005cc0d1 in pci_data_write (s=0x1da2b90, addr=2147549188, val=1, len=2) at /home/wency/source/qemu/hw/pci_host.c:75
> #6  0x00000000005cc2b1 in pci_host_data_write (handler=0x1da2b60, addr=3324, val=1, len=2) at /home/wency/source/qemu/hw/pci_host.c:125
> #7  0x000000000042c884 in ioport_simple_writew (opaque=0x1da2b60, addr=3324, value=1) at /home/wency/source/qemu/rwhandler.c:50
> #8  0x0000000000499e85 in ioport_write (index=1, address=3324, data=1) at ioport.c:81
> #9  0x000000000049a8e1 in cpu_outw (addr=3324, val=1) at ioport.c:280
> #10 0x0000000000433c5d in kvm_handle_io (port=3324, data=0x7f0b30f86000, direction=1, size=2, count=1) at /home/wency/source/qemu/kvm-all.c:837
> #11 0x00000000004341c8 in kvm_cpu_exec (env=0x1b7fc70) at /home/wency/source/qemu/kvm-all.c:976
> #12 0x000000000040da99 in cpu_exec_all () at /home/wency/source/qemu/cpus.c:1102
> #13 0x00000000005b60c4 in main_loop () at /home/wency/source/qemu/vl.c:1392
> #14 0x00000000005baa49 in main (argc=20, argv=0x7ffffa6b5a38, envp=0x7ffffa6b5ae0) at /home/wency/source/qemu/vl.c:3356
> 
> If I do not attach any device on bus bridge1, qemu can work nice.
> 
> Thanks
> Wen Congyang
> 

The following patch can fix this problem, but I'm not sure whether it is right.

>From 3ce0000e5a14f0ff7aeac148f9416eac6fa7c6ca Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Thu, 18 Aug 2011 09:33:19 +0800
Subject: [PATCH] PCI_Bridge: use parent bus's address space

The pci device may call pci_register_bar() to use PCI bus's address space.
But we forget to init PCI bus's address space if it is not bus 0. It will
cause qemu crashed.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

---
 hw/pci_bridge.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
Avi Kivity - Aug. 18, 2011, 3:15 p.m.
On 08/17/2011 08:22 PM, Wen Congyang wrote:
> At 08/17/2011 04:37 PM, Wen Congyang Write:
> >  At 07/04/2011 05:43 PM, Michael S. Tsirkin Write:
> >>  This adds support for a standard pci to pci bridge,
> >>  enabling support for more than 32 PCI devices in the system.
> >>  To use, specify the device id as a 'bus' option.
> >>  Example:
> >>  	-device pci-bridge,id=bridge1 \
> >>  	-netdev user,id=u \
> >>  	-device ne2k_pci,id=net2,bus=bridge1,netdev=u
> >>
> >>  TODO: device hotplug support.
> >
> >  I try this patch, and found that when I use pci bridge, qemu will core dump.
> >
> >  Here is my command line:
> >  /usr/local2/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 512 -name vm1 -drive file=/var/lib/libvirt/images/vm1.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=writethrough -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc 0.0.0.0:1 -device pci-bridge,id=bridge1,bus=pci.0,addr=0x08.0x0 -netdev user,id=u -device ne2k_pci,id=net2,bus=bridge1,netdev=u
> >
> >  Here is the backtrace:
> >  Core was generated by `/usr/local2/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 512 -name vm1 -dri'.
> >  Program terminated with signal 11, Segmentation fault.
> >  #0  0x0000000000438e34 in memory_region_add_subregion_common (mr=0x0, offset=49152, subregion=0x1de5d58) at /home/wency/source/qemu/memory.c:1152
> >  1152	    QTAILQ_FOREACH(other,&mr->subregions, subregions_link) {
> >  Missing separate debuginfos, use: debuginfo-install SDL-1.2.14-2.el6.x86_64 celt051-0.5.1.3-0.el6.x86_64 cyrus-sasl-gssapi-2.1.23-8.el6.x86_64 cyrus-sasl-lib-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64 cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64 glib2-2.22.5-6.el6.x86_64 glibc-2.12-1.25.el6.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6.x86_64 libX11-1.3-2.el6.x86_64 libXau-1.0.5-1.el6.x86_64 libaio-0.3.107-10.el6.x86_64 libattr-2.4.44-4.el6.x86_64 libcom_err-1.41.12-7.el6.x86_64 libcurl-7.19.7-26.el6.x86_64 libgcrypt-1.4.5-5.el6.x86_64 libgpg-error-1.7-3.el6.x86_64 libidn-1.18-2.el6.x86_64 libjpeg-6b-46.el6.x86_64 libpng-1.2.44-1.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libssh2-1.2.2-7.el6.x86_64 libtasn1-2.3-3.el6.x86_64 libuuid-2.17.2-12.el6.x86_64 libxcb-1.5-1.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 nspr-4.8.7-1.el6.x86_64 nss-3.12.9-9.el6.x86_64 nss-softokn-freebl-3.12.9-3.el6.x86_64 nss-util-3.12.9-1.el6.x86_64 openld
>
> ap
> >  -2.4.23-15.el6.x86_64 openssl-1.0.0-10.el6.x86_64 pixman-0.18.4-1.el6_0.1.x86_64 spice-server-0.8.0-1.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> >  (gdb) bt
> >  #0  0x0000000000438e34 in memory_region_add_subregion_common (mr=0x0, offset=49152, subregion=0x1de5d58) at /home/wency/source/qemu/memory.c:1152
> >  #1  0x0000000000439090 in memory_region_add_subregion_overlap (mr=0x0, offset=49152, subregion=0x1de5d58, priority=1) at /home/wency/source/qemu/memory.c:1194
> >  #2  0x00000000005c55fe in pci_update_mappings (d=0x1de5900) at /home/wency/source/qemu/hw/pci.c:1063
> >  #3  0x00000000005c5982 in pci_default_write_config (d=0x1de5900, addr=4, val=0, l=2) at /home/wency/source/qemu/hw/pci.c:1121
> >  #4  0x00000000005cbfbf in pci_host_config_write_common (pci_dev=0x1de5900, addr=4, limit=256, val=1, len=2) at /home/wency/source/qemu/hw/pci_host.c:54
> >  #5  0x00000000005cc0d1 in pci_data_write (s=0x1da2b90, addr=2147549188, val=1, len=2) at /home/wency/source/qemu/hw/pci_host.c:75
> >  #6  0x00000000005cc2b1 in pci_host_data_write (handler=0x1da2b60, addr=3324, val=1, len=2) at /home/wency/source/qemu/hw/pci_host.c:125
> >  #7  0x000000000042c884 in ioport_simple_writew (opaque=0x1da2b60, addr=3324, value=1) at /home/wency/source/qemu/rwhandler.c:50
> >  #8  0x0000000000499e85 in ioport_write (index=1, address=3324, data=1) at ioport.c:81
> >  #9  0x000000000049a8e1 in cpu_outw (addr=3324, val=1) at ioport.c:280
> >  #10 0x0000000000433c5d in kvm_handle_io (port=3324, data=0x7f0b30f86000, direction=1, size=2, count=1) at /home/wency/source/qemu/kvm-all.c:837
> >  #11 0x00000000004341c8 in kvm_cpu_exec (env=0x1b7fc70) at /home/wency/source/qemu/kvm-all.c:976
> >  #12 0x000000000040da99 in cpu_exec_all () at /home/wency/source/qemu/cpus.c:1102
> >  #13 0x00000000005b60c4 in main_loop () at /home/wency/source/qemu/vl.c:1392
> >  #14 0x00000000005baa49 in main (argc=20, argv=0x7ffffa6b5a38, envp=0x7ffffa6b5ae0) at /home/wency/source/qemu/vl.c:3356
> >
> >  If I do not attach any device on bus bridge1, qemu can work nice.
> >
> >  Thanks
> >  Wen Congyang
> >
>
> The following patch can fix this problem, but I'm not sure whether it is right.

It's correct but insufficient, the filtering code (pci_bridge_filter) 
needs to be updated to use the memory API.

Basically it gets simpler and correcter.
Wen Congyang - Aug. 19, 2011, 5:12 a.m.
At 08/18/2011 11:15 PM, Avi Kivity Write:
> On 08/17/2011 08:22 PM, Wen Congyang wrote:
>> At 08/17/2011 04:37 PM, Wen Congyang Write:
>> >  At 07/04/2011 05:43 PM, Michael S. Tsirkin Write:
>> >>  This adds support for a standard pci to pci bridge,
>> >>  enabling support for more than 32 PCI devices in the system.
>> >>  To use, specify the device id as a 'bus' option.
>> >>  Example:
>> >>      -device pci-bridge,id=bridge1 \
>> >>      -netdev user,id=u \
>> >>      -device ne2k_pci,id=net2,bus=bridge1,netdev=u
>> >>
>> >>  TODO: device hotplug support.
>> >
>> >  I try this patch, and found that when I use pci bridge, qemu will
>> core dump.
>> >
>> >  Here is my command line:
>> >  /usr/local2/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 512
>> -name vm1 -drive
>> file=/var/lib/libvirt/images/vm1.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=writethrough
>> -device
>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -vnc
>> 0.0.0.0:1 -device pci-bridge,id=bridge1,bus=pci.0,addr=0x08.0x0
>> -netdev user,id=u -device ne2k_pci,id=net2,bus=bridge1,netdev=u
>> >
>> >  Here is the backtrace:
>> >  Core was generated by `/usr/local2/bin/qemu-system-x86_64 -M
>> pc-0.14 -enable-kvm -m 512 -name vm1 -dri'.
>> >  Program terminated with signal 11, Segmentation fault.
>> >  #0  0x0000000000438e34 in memory_region_add_subregion_common
>> (mr=0x0, offset=49152, subregion=0x1de5d58) at
>> /home/wency/source/qemu/memory.c:1152
>> >  1152        QTAILQ_FOREACH(other,&mr->subregions, subregions_link) {
>> >  Missing separate debuginfos, use: debuginfo-install
>> SDL-1.2.14-2.el6.x86_64 celt051-0.5.1.3-0.el6.x86_64
>> cyrus-sasl-gssapi-2.1.23-8.el6.x86_64
>> cyrus-sasl-lib-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64
>> cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64
>> glib2-2.22.5-6.el6.x86_64 glibc-2.12-1.25.el6.x86_64
>> keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6.x86_64
>> libX11-1.3-2.el6.x86_64 libXau-1.0.5-1.el6.x86_64
>> libaio-0.3.107-10.el6.x86_64 libattr-2.4.44-4.el6.x86_64
>> libcom_err-1.41.12-7.el6.x86_64 libcurl-7.19.7-26.el6.x86_64
>> libgcrypt-1.4.5-5.el6.x86_64 libgpg-error-1.7-3.el6.x86_64
>> libidn-1.18-2.el6.x86_64 libjpeg-6b-46.el6.x86_64
>> libpng-1.2.44-1.el6.x86_64 libselinux-2.0.94-5.el6.x86_64
>> libssh2-1.2.2-7.el6.x86_64 libtasn1-2.3-3.el6.x86_64
>> libuuid-2.17.2-12.el6.x86_64 libxcb-1.5-1.el6.x86_64
>> ncurses-libs-5.7-3.20090208.el6.x86_64 nspr-4.8.7-1.el6.x86_64
>> nss-3.12.9-9.el6.x86_64 nss-softokn-freebl-3.12.9-3.el6.x86_64
>> nss-util-3.12.9-1.el6.x86_64 openld
>>
>> ap
>> >  -2.4.23-15.el6.x86_64 openssl-1.0.0-10.el6.x86_64
>> pixman-0.18.4-1.el6_0.1.x86_64 spice-server-0.8.0-1.el6.x86_64
>> zlib-1.2.3-25.el6.x86_64
>> >  (gdb) bt
>> >  #0  0x0000000000438e34 in memory_region_add_subregion_common
>> (mr=0x0, offset=49152, subregion=0x1de5d58) at
>> /home/wency/source/qemu/memory.c:1152
>> >  #1  0x0000000000439090 in memory_region_add_subregion_overlap
>> (mr=0x0, offset=49152, subregion=0x1de5d58, priority=1) at
>> /home/wency/source/qemu/memory.c:1194
>> >  #2  0x00000000005c55fe in pci_update_mappings (d=0x1de5900) at
>> /home/wency/source/qemu/hw/pci.c:1063
>> >  #3  0x00000000005c5982 in pci_default_write_config (d=0x1de5900,
>> addr=4, val=0, l=2) at /home/wency/source/qemu/hw/pci.c:1121
>> >  #4  0x00000000005cbfbf in pci_host_config_write_common
>> (pci_dev=0x1de5900, addr=4, limit=256, val=1, len=2) at
>> /home/wency/source/qemu/hw/pci_host.c:54
>> >  #5  0x00000000005cc0d1 in pci_data_write (s=0x1da2b90,
>> addr=2147549188, val=1, len=2) at
>> /home/wency/source/qemu/hw/pci_host.c:75
>> >  #6  0x00000000005cc2b1 in pci_host_data_write (handler=0x1da2b60,
>> addr=3324, val=1, len=2) at /home/wency/source/qemu/hw/pci_host.c:125
>> >  #7  0x000000000042c884 in ioport_simple_writew (opaque=0x1da2b60,
>> addr=3324, value=1) at /home/wency/source/qemu/rwhandler.c:50
>> >  #8  0x0000000000499e85 in ioport_write (index=1, address=3324,
>> data=1) at ioport.c:81
>> >  #9  0x000000000049a8e1 in cpu_outw (addr=3324, val=1) at ioport.c:280
>> >  #10 0x0000000000433c5d in kvm_handle_io (port=3324,
>> data=0x7f0b30f86000, direction=1, size=2, count=1) at
>> /home/wency/source/qemu/kvm-all.c:837
>> >  #11 0x00000000004341c8 in kvm_cpu_exec (env=0x1b7fc70) at
>> /home/wency/source/qemu/kvm-all.c:976
>> >  #12 0x000000000040da99 in cpu_exec_all () at
>> /home/wency/source/qemu/cpus.c:1102
>> >  #13 0x00000000005b60c4 in main_loop () at
>> /home/wency/source/qemu/vl.c:1392
>> >  #14 0x00000000005baa49 in main (argc=20, argv=0x7ffffa6b5a38,
>> envp=0x7ffffa6b5ae0) at /home/wency/source/qemu/vl.c:3356
>> >
>> >  If I do not attach any device on bus bridge1, qemu can work nice.
>> >
>> >  Thanks
>> >  Wen Congyang
>> >
>>
>> The following patch can fix this problem, but I'm not sure whether it
>> is right.
> 
> It's correct but insufficient, the filtering code (pci_bridge_filter)
> needs to be updated to use the memory API.

I read the function pci_bridge_filter(), and the function only read
PCI bridge's config space(command, base and limit). If base > limit,
it will set addr to PCI_BAR_UNMAPPED.

I do not find anything that needs to updated to use the memory API.

I add a scsi controller on pci bus1, and a scsi disk on this controller.
I can read and write this disk, and I do not meet any problem.

Thanks
Wen Congyang

> 
> Basically it gets simpler and correcter.
>
Avi Kivity - Aug. 19, 2011, 3:26 p.m.
On 08/18/2011 10:12 PM, Wen Congyang wrote:
> >>
> >>  The following patch can fix this problem, but I'm not sure whether it
> >>  is right.
> >
> >  It's correct but insufficient, the filtering code (pci_bridge_filter)
> >  needs to be updated to use the memory API.
>
> I read the function pci_bridge_filter(), and the function only read
> PCI bridge's config space(command, base and limit). If base>  limit,
> it will set addr to PCI_BAR_UNMAPPED.
>
> I do not find anything that needs to updated to use the memory API.

Currently it doesn't do any filtering at all.  Bridges need to create a 
new address space, then attach aliases of this region (corresponding to 
the filtered area and to the legacy vga space) to the parent bus' 
address space.

> I add a scsi controller on pci bus1, and a scsi disk on this controller.
> I can read and write this disk, and I do not meet any problem.
>

However, filtering doesn't work.  You could put a BAR outside the 
filtered area and it would be visible to the guest.
Wen Congyang - Aug. 22, 2011, 3:13 a.m.
At 08/19/2011 11:26 PM, Avi Kivity Write:
> On 08/18/2011 10:12 PM, Wen Congyang wrote:
>> >>
>> >>  The following patch can fix this problem, but I'm not sure whether it
>> >>  is right.
>> >
>> >  It's correct but insufficient, the filtering code (pci_bridge_filter)
>> >  needs to be updated to use the memory API.
>>
>> I read the function pci_bridge_filter(), and the function only read
>> PCI bridge's config space(command, base and limit). If base>  limit,
>> it will set addr to PCI_BAR_UNMAPPED.
>>
>> I do not find anything that needs to updated to use the memory API.
> 
> Currently it doesn't do any filtering at all.  Bridges need to create a
> new address space, then attach aliases of this region (corresponding to
> the filtered area and to the legacy vga space) to the parent bus'
> address space.

Hmm, does this problem exist before memory API is introduced?

> 
>> I add a scsi controller on pci bus1, and a scsi disk on this controller.
>> I can read and write this disk, and I do not meet any problem.
>>
> 
> However, filtering doesn't work.  You could put a BAR outside the
> filtered area and it would be visible to the guest.

How to put a BAR outside the filtered area and confirm whether it would be
virible to the guest?
Avi Kivity - Aug. 22, 2011, 6:23 a.m.
On 08/22/2011 06:13 AM, Wen Congyang wrote:
> At 08/19/2011 11:26 PM, Avi Kivity Write:
> >  On 08/18/2011 10:12 PM, Wen Congyang wrote:
> >>  >>
> >>  >>   The following patch can fix this problem, but I'm not sure whether it
> >>  >>   is right.
> >>  >
> >>  >   It's correct but insufficient, the filtering code (pci_bridge_filter)
> >>  >   needs to be updated to use the memory API.
> >>
> >>  I read the function pci_bridge_filter(), and the function only read
> >>  PCI bridge's config space(command, base and limit). If base>   limit,
> >>  it will set addr to PCI_BAR_UNMAPPED.
> >>
> >>  I do not find anything that needs to updated to use the memory API.
> >
> >  Currently it doesn't do any filtering at all.  Bridges need to create a
> >  new address space, then attach aliases of this region (corresponding to
> >  the filtered area and to the legacy vga space) to the parent bus'
> >  address space.
>
> Hmm, does this problem exist before memory API is introduced?

Yes.  There was code to handle it, but I don't think it was correct.

>
> >
> >>  I add a scsi controller on pci bus1, and a scsi disk on this controller.
> >>  I can read and write this disk, and I do not meet any problem.
> >>
> >
> >  However, filtering doesn't work.  You could put a BAR outside the
> >  filtered area and it would be visible to the guest.
>
> How to put a BAR outside the filtered area and confirm whether it would be
> virible to the guest?
>
>


You could use something like kvm-unit-tests.git to write a simple test 
that sets up a BAR (say from hw/ivshmem.c), writes and reads to see that 
it is visible, programs the bridge to filter part of the BAR out, then 
writes and reads again to verify that the correct part is filtered out.
Michael S. Tsirkin - Aug. 26, 2011, 9:43 a.m.
On Thu, Aug 18, 2011 at 08:15:43AM -0700, Avi Kivity wrote:
> It's correct but insufficient, the filtering code
> (pci_bridge_filter) needs to be updated to use the memory API.
> 
> Basically it gets simpler and correcter.

I've been struggling with the following problem: bridges have two memory
ranges: prefetcheable and non-prefetcheable.

Memory in the device can be behind the prefetcheable and
non-prefetcheable memory range, but things only work correctly if
non-prefetcheable memory on the device is put behind a non-prefetcheable
range. Prefetcheable memory can go anywhere I think.

This didn't work correctly before the memory API change,
but it was easy to fix ... Now I'm not sure how.
Michael S. Tsirkin - Aug. 26, 2011, 9:57 a.m.
On Thu, Aug 18, 2011 at 11:22:31AM +0800, Wen Congyang wrote:
> >From 3ce0000e5a14f0ff7aeac148f9416eac6fa7c6ca Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Thu, 18 Aug 2011 09:33:19 +0800
> Subject: [PATCH] PCI_Bridge: use parent bus's address space
> 
> The pci device may call pci_register_bar() to use PCI bus's address space.
> But we forget to init PCI bus's address space if it is not bus 0. It will
> cause qemu crashed.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

I've applied this for now so we can make progress.

> ---
>  hw/pci_bridge.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
> index 464d897..df16faa 100644
> --- a/hw/pci_bridge.c
> +++ b/hw/pci_bridge.c
> @@ -246,6 +246,8 @@ int pci_bridge_initfn(PCIDevice *dev)
>                          br->bus_name);
>      sec_bus->parent_dev = dev;
>      sec_bus->map_irq = br->map_irq;
> +    sec_bus->address_space_mem = parent->address_space_mem;
> +    sec_bus->address_space_io = parent->address_space_io;
>  
>      QLIST_INIT(&sec_bus->child);
>      QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling);
> -- 
> 1.7.1
Avi Kivity - Aug. 28, 2011, 7:50 a.m.
On 08/26/2011 12:43 PM, Michael S. Tsirkin wrote:
> On Thu, Aug 18, 2011 at 08:15:43AM -0700, Avi Kivity wrote:
> >  It's correct but insufficient, the filtering code
> >  (pci_bridge_filter) needs to be updated to use the memory API.
> >
> >  Basically it gets simpler and correcter.
>
> I've been struggling with the following problem: bridges have two memory
> ranges: prefetcheable and non-prefetcheable.
>
> Memory in the device can be behind the prefetcheable and
> non-prefetcheable memory range, but things only work correctly if
> non-prefetcheable memory on the device is put behind a non-prefetcheable
> range. Prefetcheable memory can go anywhere I think.
>
> This didn't work correctly before the memory API change,
> but it was easy to fix ... Now I'm not sure how.

If it really matters, you can add a prefetchability attribute to 
MemoryRegions.  Does it though?
Michael S. Tsirkin - Aug. 28, 2011, 11:41 a.m.
On Sun, Aug 28, 2011 at 10:50:49AM +0300, Avi Kivity wrote:
> On 08/26/2011 12:43 PM, Michael S. Tsirkin wrote:
> >On Thu, Aug 18, 2011 at 08:15:43AM -0700, Avi Kivity wrote:
> >>  It's correct but insufficient, the filtering code
> >>  (pci_bridge_filter) needs to be updated to use the memory API.
> >>
> >>  Basically it gets simpler and correcter.
> >
> >I've been struggling with the following problem: bridges have two memory
> >ranges: prefetcheable and non-prefetcheable.
> >
> >Memory in the device can be behind the prefetcheable and
> >non-prefetcheable memory range, but things only work correctly if
> >non-prefetcheable memory on the device is put behind a non-prefetcheable
> >range. Prefetcheable memory can go anywhere I think.
> >
> >This didn't work correctly before the memory API change,
> >but it was easy to fix ... Now I'm not sure how.
> 
> If it really matters, you can add a prefetchability attribute to
> MemoryRegions.  Does it though?

Well, its another one of these things that 
guests *probably* won't in practice use.
But I don't see a way to be sure.

If the guest puts a prefetcheable memory BAR behind
a non-prefetcheable range in the bridge, it won't
be able to access that BAR, and it should.

Prefetcheable BARs on devices are less common than
non-prefetcheable, but they do exist:
I have a system with 2 devices: a VGA controller from Matrox
and an ethernet card from Mellanox have
prefetcheable BARs.

I'm not sure how prefetcheability attribute will help.
Could you explain pls?


> -- 
> I have a truly marvellous patch that fixes the bug which this
> signature is too narrow to contain.
Avi Kivity - Aug. 28, 2011, 1:10 p.m.
On 08/28/2011 02:41 PM, Michael S. Tsirkin wrote:
> >
> >  If it really matters, you can add a prefetchability attribute to
> >  MemoryRegions.  Does it though?
>
> Well, its another one of these things that
> guests *probably* won't in practice use.
> But I don't see a way to be sure.
>
> If the guest puts a prefetcheable memory BAR behind
> a non-prefetcheable range in the bridge, it won't
> be able to access that BAR, and it should.

Not sure I understand - on real hardware, does it see the BAR or not?

>
> Prefetcheable BARs on devices are less common than
> non-prefetcheable, but they do exist:
> I have a system with 2 devices: a VGA controller from Matrox
> and an ethernet card from Mellanox have
> prefetcheable BARs.
>
> I'm not sure how prefetcheability attribute will help.
> Could you explain pls?
>

Have an attribute for a region "does not allow prefetchable memory" in 
subregions
Have an attribute for a region "prefetchable memory"

When rendering the memory map, ignore any "prefetchable memory" regions 
that are subregions of a "does not allow prefetchable memory" region.

(if I understood correctly - not sure)
Michael S. Tsirkin - Aug. 28, 2011, 1:42 p.m.
On Sun, Aug 28, 2011 at 04:10:14PM +0300, Avi Kivity wrote:
> On 08/28/2011 02:41 PM, Michael S. Tsirkin wrote:
> >>
> >>  If it really matters, you can add a prefetchability attribute to
> >>  MemoryRegions.  Does it though?
> >
> >Well, its another one of these things that
> >guests *probably* won't in practice use.
> >But I don't see a way to be sure.
> >
> >If the guest puts a prefetcheable memory BAR behind
> >a non-prefetcheable range in the bridge, it won't
> >be able to access that BAR, and it should.
> 
> Not sure I understand - on real hardware, does it see the BAR or not?

It does.

> >
> >Prefetcheable BARs on devices are less common than
> >non-prefetcheable, but they do exist:
> >I have a system with 2 devices: a VGA controller from Matrox
> >and an ethernet card from Mellanox have
> >prefetcheable BARs.
> >
> >I'm not sure how prefetcheability attribute will help.
> >Could you explain pls?
> >
> 
> Have an attribute for a region "does not allow prefetchable memory"
> in subregions
> Have an attribute for a region "prefetchable memory"
> 
> When rendering the memory map, ignore any "prefetchable memory"
> regions that are subregions of a "does not allow prefetchable
> memory" region.
> 
> (if I understood correctly - not sure)

Sorry, I've been unclear. Or maybe I misunderstood?

ATM we have each BAR as a memory region, which is
in turn within io or memory address space region.
With bridges, each bridge has a single range
covering legal io addresses below it, and two ranges for memory.

Example from a real system:
        Memory behind bridge: 98200000-982fffff
        Prefetchable memory behind bridge: 0000000097000000-00000000977fffff

And a device can have:

        Region 0: Memory at 98200000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at 97000000 (64-bit, prefetchable) [size=8M]


guest can in theory reprogram this:

        Memory behind bridge: 98200000-98afffff
        Prefetchable memory behind bridge: 0000000097000000-00000000977fffff

and
        Region 0: Memory at 98200000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at 98300000 (64-bit, prefetchable) [size=8M]

and the device will work (presumably, guests will try to avoid this
as they assume prefetchable ranges are faster).

Thus, which range the device BAR is behind depends on the
programmed values. How do we model that?


A side note on bus filtering:
In cases of bus range partially hinding the BAR from the guest, one can
even have multiple non-contigious bits of the BAR visible and the rest
hidden.

I'm not saying it's very important to model this,
I'm guessing the only important cases are all of the
BAR visible and all of the BAR hidden.


> -- 
> error compiling committee.c: too many arguments to function
Avi Kivity - Aug. 28, 2011, 1:53 p.m.
On 08/28/2011 04:42 PM, Michael S. Tsirkin wrote:
> On Sun, Aug 28, 2011 at 04:10:14PM +0300, Avi Kivity wrote:
> >  On 08/28/2011 02:41 PM, Michael S. Tsirkin wrote:
> >  >>
> >  >>   If it really matters, you can add a prefetchability attribute to
> >  >>   MemoryRegions.  Does it though?
> >  >
> >  >Well, its another one of these things that
> >  >guests *probably* won't in practice use.
> >  >But I don't see a way to be sure.
> >  >
> >  >If the guest puts a prefetcheable memory BAR behind
> >  >a non-prefetcheable range in the bridge, it won't
> >  >be able to access that BAR, and it should.
> >
> >  Not sure I understand - on real hardware, does it see the BAR or not?
>
> It does.

Ok, was different from what I thought.  So anything that matches the two 
windows is exposed (after clipping).  If the guest enables the legacy 
vga range, then there are three regions, with equal treatment, yes?

> ATM we have each BAR as a memory region, which is
> in turn within io or memory address space region.
> With bridges, each bridge has a single range
> covering legal io addresses below it, and two ranges for memory.
>
> Example from a real system:
>          Memory behind bridge: 98200000-982fffff
>          Prefetchable memory behind bridge: 0000000097000000-00000000977fffff
>
> And a device can have:
>
>          Region 0: Memory at 98200000 (64-bit, non-prefetchable) [size=1M]
>          Region 2: Memory at 97000000 (64-bit, prefetchable) [size=8M]
>
>
> guest can in theory reprogram this:
>
>          Memory behind bridge: 98200000-98afffff
>          Prefetchable memory behind bridge: 0000000097000000-00000000977fffff
>
> and
>          Region 0: Memory at 98200000 (64-bit, non-prefetchable) [size=1M]
>          Region 2: Memory at 98300000 (64-bit, prefetchable) [size=8M]
>
> and the device will work (presumably, guests will try to avoid this
> as they assume prefetchable ranges are faster).

> Thus, which range the device BAR is behind depends on the
> programmed values. How do we model that?

Create a memory region for the bridge's address space.  This region is 
not directly added to system_memory or its descendants.  Devices under 
the bridge see this region as its pci_address_space().  The region is as 
large as the entire address space - it does not take into account any 
windows.

For each of the three windows (pref, non-pref, vga), create an alias 
with the appropriate start and size.  Map the alias into the bridge's 
parent's pci_address_space(), as subregions.

fx440 does exactly this, with the following cosmetic changes:

- the windows are different (vga, pci hole, 64-bit pci area, PAMx, SMRAM)
- instead of mapping them to the parent bridge's pci_address_space(), we 
map them to get_system_memory()

> A side note on bus filtering:
> In cases of bus range partially hinding the BAR from the guest, one can
> even have multiple non-contigious bits of the BAR visible and the rest
> hidden.

The memory API will deal with this perfectly.

> I'm not saying it's very important to model this,
> I'm guessing the only important cases are all of the
> BAR visible and all of the BAR hidden.

It should all work.  Including a sub-bridge's windows being fragmented 
by the parent bridge.  Too bad it doesn't matter in practice, because 
it's a really neat solution to this non-problem.
Wen Congyang - Sept. 2, 2011, 1:32 a.m.
At 08/22/2011 02:23 PM, Avi Kivity Write:
> On 08/22/2011 06:13 AM, Wen Congyang wrote:
>> At 08/19/2011 11:26 PM, Avi Kivity Write:
>> >  On 08/18/2011 10:12 PM, Wen Congyang wrote:
>> >>  >>
>> >>  >>   The following patch can fix this problem, but I'm not sure
>> whether it
>> >>  >>   is right.
>> >>  >
>> >>  >   It's correct but insufficient, the filtering code
>> (pci_bridge_filter)
>> >>  >   needs to be updated to use the memory API.
>> >>
>> >>  I read the function pci_bridge_filter(), and the function only read
>> >>  PCI bridge's config space(command, base and limit). If base>   limit,
>> >>  it will set addr to PCI_BAR_UNMAPPED.
>> >>
>> >>  I do not find anything that needs to updated to use the memory API.
>> >
>> >  Currently it doesn't do any filtering at all.  Bridges need to
>> create a
>> >  new address space, then attach aliases of this region
>> (corresponding to
>> >  the filtered area and to the legacy vga space) to the parent bus'
>> >  address space.
>>
>> Hmm, does this problem exist before memory API is introduced?
> 
> Yes.  There was code to handle it, but I don't think it was correct.
> 
>>
>> >
>> >>  I add a scsi controller on pci bus1, and a scsi disk on this
>> controller.
>> >>  I can read and write this disk, and I do not meet any problem.
>> >>
>> >
>> >  However, filtering doesn't work.  You could put a BAR outside the
>> >  filtered area and it would be visible to the guest.
>>
>> How to put a BAR outside the filtered area and confirm whether it
>> would be
>> virible to the guest?
>>
>>
> 
> 
> You could use something like kvm-unit-tests.git to write a simple test
> that sets up a BAR (say from hw/ivshmem.c), writes and reads to see that
> it is visible, programs the bridge to filter part of the BAR out, then
> writes and reads again to verify that the correct part is filtered out.
> 

I test it with rtl8139(Because I have a such real hardware).
I add some fprintf in the callback function: rtl8139_mmio_writeb(),
rtl8139_mmio_writel(), rtl8139_mmio_writew(), rtl8139_mmio_readb(),
rtl8139_mmio_readw(), rtl8139_mmio_readl().
My test way:
1. unbind the device from rtl8139 driver
2. modify the BAR
3. reprobe the device

If the BAR is visible, the OS will access it when we reprobe the device.
If the BAR is not visible, the OS will access it.
I can not reprobe the real device if the BAR is not visible. But I can
reprobe the virtual device if the BAR is not visible.

Thanks
Wen Congyang
Wen Congyang - Sept. 2, 2011, 2:56 a.m.
At 08/22/2011 02:23 PM, Avi Kivity Write:
> On 08/22/2011 06:13 AM, Wen Congyang wrote:
>> At 08/19/2011 11:26 PM, Avi Kivity Write:
>> >  On 08/18/2011 10:12 PM, Wen Congyang wrote:
>> >>  >>
>> >>  >>   The following patch can fix this problem, but I'm not sure
>> whether it
>> >>  >>   is right.
>> >>  >
>> >>  >   It's correct but insufficient, the filtering code
>> (pci_bridge_filter)
>> >>  >   needs to be updated to use the memory API.
>> >>
>> >>  I read the function pci_bridge_filter(), and the function only read
>> >>  PCI bridge's config space(command, base and limit). If base>   limit,
>> >>  it will set addr to PCI_BAR_UNMAPPED.
>> >>
>> >>  I do not find anything that needs to updated to use the memory API.
>> >
>> >  Currently it doesn't do any filtering at all.  Bridges need to
>> create a
>> >  new address space, then attach aliases of this region
>> (corresponding to
>> >  the filtered area and to the legacy vga space) to the parent bus'
>> >  address space.
>>
>> Hmm, does this problem exist before memory API is introduced?
> 
> Yes.  There was code to handle it, but I don't think it was correct.
> 
>>
>> >
>> >>  I add a scsi controller on pci bus1, and a scsi disk on this
>> controller.
>> >>  I can read and write this disk, and I do not meet any problem.
>> >>
>> >
>> >  However, filtering doesn't work.  You could put a BAR outside the
>> >  filtered area and it would be visible to the guest.
>>
>> How to put a BAR outside the filtered area and confirm whether it
>> would be
>> virible to the guest?
>>
>>
> 
> 
> You could use something like kvm-unit-tests.git to write a simple test
> that sets up a BAR (say from hw/ivshmem.c), writes and reads to see that
> it is visible, programs the bridge to filter part of the BAR out, then
> writes and reads again to verify that the correct part is filtered out.

I am testing ivshmem now. But I do not know how to access the memory
specified in the BAR.

Thanks
Wen Congyang

>
Avi Kivity - Sept. 4, 2011, 8:25 a.m.
On 09/02/2011 05:56 AM, Wen Congyang wrote:
> >
> >  You could use something like kvm-unit-tests.git to write a simple test
> >  that sets up a BAR (say from hw/ivshmem.c), writes and reads to see that
> >  it is visible, programs the bridge to filter part of the BAR out, then
> >  writes and reads again to verify that the correct part is filtered out.
>
> I am testing ivshmem now. But I do not know how to access the memory
> specified in the BAR.
>
>

Use the uio driver - 
http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You just 
mmap() the BAR from userspace and play with it.
Michael S. Tsirkin - Sept. 4, 2011, 12:30 p.m.
On Sun, Aug 28, 2011 at 04:53:33PM +0300, Avi Kivity wrote:
> On 08/28/2011 04:42 PM, Michael S. Tsirkin wrote:
> >On Sun, Aug 28, 2011 at 04:10:14PM +0300, Avi Kivity wrote:
> >>  On 08/28/2011 02:41 PM, Michael S. Tsirkin wrote:
> >>  >>
> >>  >>   If it really matters, you can add a prefetchability attribute to
> >>  >>   MemoryRegions.  Does it though?
> >>  >
> >>  >Well, its another one of these things that
> >>  >guests *probably* won't in practice use.
> >>  >But I don't see a way to be sure.
> >>  >
> >>  >If the guest puts a prefetcheable memory BAR behind
> >>  >a non-prefetcheable range in the bridge, it won't
> >>  >be able to access that BAR, and it should.
> >>
> >>  Not sure I understand - on real hardware, does it see the BAR or not?
> >
> >It does.
> 
> Ok, was different from what I thought.  So anything that matches the
> two windows is exposed (after clipping).  If the guest enables the
> legacy vga range, then there are three regions, with equal
> treatment, yes?
> 
> >ATM we have each BAR as a memory region, which is
> >in turn within io or memory address space region.
> >With bridges, each bridge has a single range
> >covering legal io addresses below it, and two ranges for memory.
> >
> >Example from a real system:
> >         Memory behind bridge: 98200000-982fffff
> >         Prefetchable memory behind bridge: 0000000097000000-00000000977fffff
> >
> >And a device can have:
> >
> >         Region 0: Memory at 98200000 (64-bit, non-prefetchable) [size=1M]
> >         Region 2: Memory at 97000000 (64-bit, prefetchable) [size=8M]
> >
> >
> >guest can in theory reprogram this:
> >
> >         Memory behind bridge: 98200000-98afffff
> >         Prefetchable memory behind bridge: 0000000097000000-00000000977fffff
> >
> >and
> >         Region 0: Memory at 98200000 (64-bit, non-prefetchable) [size=1M]
> >         Region 2: Memory at 98300000 (64-bit, prefetchable) [size=8M]
> >
> >and the device will work (presumably, guests will try to avoid this
> >as they assume prefetchable ranges are faster).
> 
> >Thus, which range the device BAR is behind depends on the
> >programmed values. How do we model that?
> 
> Create a memory region for the bridge's address space.  This region
> is not directly added to system_memory or its descendants.

I do this for each bridge in the hierarchy, right?

> Devices
> under the bridge see this region as its pci_address_space().  The
> region is as large as the entire address space - it does not take
> into account any windows.
> 
> For each of the three windows (pref, non-pref, vga), create an alias
> with the appropriate start and size.  Map the alias into the
> bridge's parent's pci_address_space(), as subregions.
> 
> fx440 does exactly this, with the following cosmetic changes:
> 
> - the windows are different (vga, pci hole, 64-bit pci area, PAMx, SMRAM)
> - instead of mapping them to the parent bridge's
> pci_address_space(), we map them to get_system_memory()
> 
> >A side note on bus filtering:
> >In cases of bus range partially hinding the BAR from the guest, one can
> >even have multiple non-contigious bits of the BAR visible and the rest
> >hidden.
> 
> The memory API will deal with this perfectly.
> 
> >I'm not saying it's very important to model this,
> >I'm guessing the only important cases are all of the
> >BAR visible and all of the BAR hidden.
> 
> It should all work.  Including a sub-bridge's windows being
> fragmented by the parent bridge.  Too bad it doesn't matter in
> practice, because it's a really neat solution to this non-problem.

Hmm, what ties the windows of a child bridge
to be within the windows of a parent?



> -- 
> error compiling committee.c: too many arguments to function
Avi Kivity - Sept. 4, 2011, 12:40 p.m.
On 09/04/2011 03:30 PM, Michael S. Tsirkin wrote:
> >
> >  Create a memory region for the bridge's address space.  This region
> >  is not directly added to system_memory or its descendants.
>
> I do this for each bridge in the hierarchy, right?

Each bridge does this independently (so yes).

> >  fx440 does exactly this, with the following cosmetic changes:
> >
> >  - the windows are different (vga, pci hole, 64-bit pci area, PAMx, SMRAM)
> >  - instead of mapping them to the parent bridge's
> >  pci_address_space(), we map them to get_system_memory()
>
> Hmm, what ties the windows of a child bridge
> to be within the windows of a parent?
>

system_memory
   |
   +--- pci0_alias0 (aliases part of pci0)

pci0
   |
   +--- pci1_alias0 (a bridge)

pci1
   |
   +--- pci2_alias0 (another bridge)

pci2
   |
   +--- BAR

When rendering the memory hierarchy, the only parts of BAR which are 
visible are those that fit the clipping regions pci0_alias0 ^ 
pci1_alias0 ^ pci2_alias0.  If there are multiple aliases (like the low 
and high PCI holes, and PAM, it becomes (pci0_alias0 v pci0_alias1) ^ 
(pci1_alias0v pci1_alias1) ^ (pci2_alias0 v pci2_alias1). ( "^" == 
intersection, "v" == union )
Michael S. Tsirkin - Sept. 4, 2011, 1:01 p.m.
On Sun, Sep 04, 2011 at 03:40:58PM +0300, Avi Kivity wrote:
> On 09/04/2011 03:30 PM, Michael S. Tsirkin wrote:
> >>
> >>  Create a memory region for the bridge's address space.  This region
> >>  is not directly added to system_memory or its descendants.
> >
> >I do this for each bridge in the hierarchy, right?
> 
> Each bridge does this independently (so yes).
> 
> >>  fx440 does exactly this, with the following cosmetic changes:
> >>
> >>  - the windows are different (vga, pci hole, 64-bit pci area, PAMx, SMRAM)
> >>  - instead of mapping them to the parent bridge's
> >>  pci_address_space(), we map them to get_system_memory()
> >
> >Hmm, what ties the windows of a child bridge
> >to be within the windows of a parent?
> >
> 
> system_memory
>   |
>   +--- pci0_alias0 (aliases part of pci0)
> 
> pci0
>   |
>   +--- pci1_alias0 (a bridge)
> 
> pci1
>   |
>   +--- pci2_alias0 (another bridge)
> 
> pci2
>   |
>   +--- BAR
> 
> When rendering the memory hierarchy, the only parts of BAR which are
> visible are those that fit the clipping regions pci0_alias0 ^
> pci1_alias0 ^ pci2_alias0.  If there are multiple aliases (like the
> low and high PCI holes, and PAM, it becomes (pci0_alias0 v
> pci0_alias1) ^ (pci1_alias0v pci1_alias1) ^ (pci2_alias0 v
> pci2_alias1). ( "^" == intersection, "v" == union )

What about BAR directly behind pci0?
That should be unaffected by bridges pci1 and pci2
since the device is not behind that.


> -- 
> error compiling committee.c: too many arguments to function
Avi Kivity - Sept. 4, 2011, 1:05 p.m.
On 09/04/2011 04:01 PM, Michael S. Tsirkin wrote:
> On Sun, Sep 04, 2011 at 03:40:58PM +0300, Avi Kivity wrote:
> >  On 09/04/2011 03:30 PM, Michael S. Tsirkin wrote:
> >  >>
> >  >>   Create a memory region for the bridge's address space.  This region
> >  >>   is not directly added to system_memory or its descendants.
> >  >
> >  >I do this for each bridge in the hierarchy, right?
> >
> >  Each bridge does this independently (so yes).
> >
> >  >>   fx440 does exactly this, with the following cosmetic changes:
> >  >>
> >  >>   - the windows are different (vga, pci hole, 64-bit pci area, PAMx, SMRAM)
> >  >>   - instead of mapping them to the parent bridge's
> >  >>   pci_address_space(), we map them to get_system_memory()
> >  >
> >  >Hmm, what ties the windows of a child bridge
> >  >to be within the windows of a parent?
> >  >
> >
> >  system_memory
> >    |
> >    +--- pci0_alias0 (aliases part of pci0)
> >
> >  pci0
> >    |
> >    +--- pci1_alias0 (a bridge)
> >
> >  pci1
> >    |
> >    +--- pci2_alias0 (another bridge)
> >
> >  pci2
> >    |
> >    +--- BAR
> >
> >  When rendering the memory hierarchy, the only parts of BAR which are
> >  visible are those that fit the clipping regions pci0_alias0 ^
> >  pci1_alias0 ^ pci2_alias0.  If there are multiple aliases (like the
> >  low and high PCI holes, and PAM, it becomes (pci0_alias0 v
> >  pci0_alias1) ^ (pci1_alias0v pci1_alias1) ^ (pci2_alias0 v
> >  pci2_alias1). ( "^" == intersection, "v" == union )
>
> What about BAR directly behind pci0?
> That should be unaffected by bridges pci1 and pci2
> since the device is not behind that.
>

It follows naturally:

system_memory
   |
   +--- pci0_alias0 (aliases part of pci0)
                |
pci0 <-+
   |
   +--- pci1_alias0 (a bridge)
   |            |
   +--- BAR0.0  |
                |
pci1 <-+
   |
   +--- pci2_alias0 (another bridge)
   |            |
   +--- BAR1.0  |
                |
pci2 <-+
   |
   +--- BAR2.0


BAR0.0 is only filtered by pci0_alias*, BAR1.0 is filtered by 
pci[01]_alias*.  Since pci_register_bar() adds the BAR as a subregion of 
the bridge's pci_address_space(), it works without any global knowledge, 
with this topology, or if pci1 and pci2 are siblings instead of parent 
and child.
Avi Kivity - Sept. 4, 2011, 1:09 p.m.
On 09/04/2011 04:05 PM, Avi Kivity wrote:
>
> It follows naturally:
>
> system_memory
>   |
>   +--- pci0_alias0 (aliases part of pci0)
>                |
> pci0 <-+

The pointer from pci0_alias to pci0 looked a lot better when I drew it.
Wen Congyang - Sept. 6, 2011, 3:06 a.m.
At 09/04/2011 04:25 PM, Avi Kivity Write:
> On 09/02/2011 05:56 AM, Wen Congyang wrote:
>> >
>> >  You could use something like kvm-unit-tests.git to write a simple test
>> >  that sets up a BAR (say from hw/ivshmem.c), writes and reads to see
>> that
>> >  it is visible, programs the bridge to filter part of the BAR out, then
>> >  writes and reads again to verify that the correct part is filtered
>> out.
>>
>> I am testing ivshmem now. But I do not know how to access the memory
>> specified in the BAR.
>>
>>
> 
> Use the uio driver -
> http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You just
> mmap() the BAR from userspace and play with it.

When I try to bind ivshmem to uio_pci_generic, I get the following messages:
uio_pci_generic 0000:01:01.0: No IRQ assigned to device: no support for interrupts?

Thanks
Wen Congyang
Avi Kivity - Sept. 6, 2011, 7:45 a.m.
On 09/06/2011 06:06 AM, Wen Congyang wrote:
> >  Use the uio driver -
> >  http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You just
> >  mmap() the BAR from userspace and play with it.
>
> When I try to bind ivshmem to uio_pci_generic, I get the following messages:
> uio_pci_generic 0000:01:01.0: No IRQ assigned to device: no support for interrupts?
>

No idea what this means.
Wen Congyang - Sept. 7, 2011, 4:39 a.m.
At 09/06/2011 03:45 PM, Avi Kivity Write:
> On 09/06/2011 06:06 AM, Wen Congyang wrote:
>> >  Use the uio driver -
>> >  http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You
>> just
>> >  mmap() the BAR from userspace and play with it.
>>
>> When I try to bind ivshmem to uio_pci_generic, I get the following
>> messages:
>> uio_pci_generic 0000:01:01.0: No IRQ assigned to device: no support
>> for interrupts?
>>
> 
> No idea what this means.

PCI 3.0 6.2.4
For x86 based PCs, the values in this register correspond to IRQ numbers (0-15) of the standard dual
8259 configuration. The value 255 is defined as meaning "unknown" or "no connection" to the interrupt
controller. Values between 15 and 254 are reserved.

The register is interrupt line.

I read the config of this device, the interrupt line is 0. It means that it uses the IRQ0.

The following is the uio_pci_generic's code:
static int __devinit probe(struct pci_dev *pdev,
			   const struct pci_device_id *id)
{
	struct uio_pci_generic_dev *gdev;
	int err;

	err = pci_enable_device(pdev);
	if (err) {
		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
			__func__, err);
		return err;
	}

	if (!pdev->irq) {
		dev_warn(&pdev->dev, "No IRQ assigned to device: "
			 "no support for interrupts?\n");
		pci_disable_device(pdev);
		return -ENODEV;
	}
...
}

This function will be called when we write 'domain:bus:slot.function' to /sys/bus/pci/drivers/uio_pci_generic/bind.
pdev->irq is 0, it means the device uses IRQ0. But we refuse it. I do not why.

To Michael S. Tsirkin
This code is writen by you. Do you know why you check whether pdev->irq is 0?

Thanks
Wen Congyang

>
Michael S. Tsirkin - Sept. 7, 2011, 11:52 a.m.
On Wed, Sep 07, 2011 at 12:39:09PM +0800, Wen Congyang wrote:
> At 09/06/2011 03:45 PM, Avi Kivity Write:
> > On 09/06/2011 06:06 AM, Wen Congyang wrote:
> >> >  Use the uio driver -
> >> >  http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You
> >> just
> >> >  mmap() the BAR from userspace and play with it.
> >>
> >> When I try to bind ivshmem to uio_pci_generic, I get the following
> >> messages:
> >> uio_pci_generic 0000:01:01.0: No IRQ assigned to device: no support
> >> for interrupts?
> >>
> > 
> > No idea what this means.
> 
> PCI 3.0 6.2.4
> For x86 based PCs, the values in this register correspond to IRQ numbers (0-15) of the standard dual
> 8259 configuration. The value 255 is defined as meaning "unknown" or "no connection" to the interrupt
> controller. Values between 15 and 254 are reserved.
> 
> The register is interrupt line.
> 
> I read the config of this device, the interrupt line is 0. It means that it uses the IRQ0.
> 
> The following is the uio_pci_generic's code:
> static int __devinit probe(struct pci_dev *pdev,
> 			   const struct pci_device_id *id)
> {
> 	struct uio_pci_generic_dev *gdev;
> 	int err;
> 
> 	err = pci_enable_device(pdev);
> 	if (err) {
> 		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
> 			__func__, err);
> 		return err;
> 	}
> 
> 	if (!pdev->irq) {
> 		dev_warn(&pdev->dev, "No IRQ assigned to device: "
> 			 "no support for interrupts?\n");
> 		pci_disable_device(pdev);
> 		return -ENODEV;
> 	}
> ...
> }
> 
> This function will be called when we write 'domain:bus:slot.function' to /sys/bus/pci/drivers/uio_pci_generic/bind.
> pdev->irq is 0, it means the device uses IRQ0. But we refuse it. I do not why.
> 
> To Michael S. Tsirkin
> This code is writen by you. Do you know why you check whether pdev->irq is 0?
> 
> Thanks
> Wen Congyang
> 
> > 

Well I see this in linux:

/*
 * Read interrupt line and base address registers.
 * The architecture-dependent code can tweak these, of course.
 */
static void pci_read_irq(struct pci_dev *dev)
{
        unsigned char irq;

        pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &irq);
        dev->pin = irq;
        if (irq)
                pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &irq);
        dev->irq = irq;
}

Thus a device without an interrupt pin will get irq set to 0,
and this seems the right way to detect such devices.
I don't think PCI devices really use IRQ0 in practice,
its probably used for PC things. More likely the system is
misconfigured.  Try lspci -vv to see what went wrong.
Wen Congyang - Sept. 8, 2011, 6:15 a.m.
At 09/07/2011 07:52 PM, Michael S. Tsirkin Write:
> On Wed, Sep 07, 2011 at 12:39:09PM +0800, Wen Congyang wrote:
>> At 09/06/2011 03:45 PM, Avi Kivity Write:
>>> On 09/06/2011 06:06 AM, Wen Congyang wrote:
>>>>>  Use the uio driver -
>>>>>  http://docs.blackfin.uclinux.org/kernel/generated/uio-howto/.  You
>>>> just
>>>>>  mmap() the BAR from userspace and play with it.
>>>>
>>>> When I try to bind ivshmem to uio_pci_generic, I get the following
>>>> messages:
>>>> uio_pci_generic 0000:01:01.0: No IRQ assigned to device: no support
>>>> for interrupts?
>>>>
>>>
>>> No idea what this means.
>>
>> PCI 3.0 6.2.4
>> For x86 based PCs, the values in this register correspond to IRQ numbers (0-15) of the standard dual
>> 8259 configuration. The value 255 is defined as meaning "unknown" or "no connection" to the interrupt
>> controller. Values between 15 and 254 are reserved.
>>
>> The register is interrupt line.
>>
>> I read the config of this device, the interrupt line is 0. It means that it uses the IRQ0.
>>
>> The following is the uio_pci_generic's code:
>> static int __devinit probe(struct pci_dev *pdev,
>> 			   const struct pci_device_id *id)
>> {
>> 	struct uio_pci_generic_dev *gdev;
>> 	int err;
>>
>> 	err = pci_enable_device(pdev);
>> 	if (err) {
>> 		dev_err(&pdev->dev, "%s: pci_enable_device failed: %d\n",
>> 			__func__, err);
>> 		return err;
>> 	}
>>
>> 	if (!pdev->irq) {
>> 		dev_warn(&pdev->dev, "No IRQ assigned to device: "
>> 			 "no support for interrupts?\n");
>> 		pci_disable_device(pdev);
>> 		return -ENODEV;
>> 	}
>> ...
>> }
>>
>> This function will be called when we write 'domain:bus:slot.function' to /sys/bus/pci/drivers/uio_pci_generic/bind.
>> pdev->irq is 0, it means the device uses IRQ0. But we refuse it. I do not why.
>>
>> To Michael S. Tsirkin
>> This code is writen by you. Do you know why you check whether pdev->irq is 0?
>>
>> Thanks
>> Wen Congyang
>>
>>>
> 
> Well I see this in linux:
> 
> /*
>  * Read interrupt line and base address registers.
>  * The architecture-dependent code can tweak these, of course.
>  */
> static void pci_read_irq(struct pci_dev *dev)
> {
>         unsigned char irq;
> 
>         pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &irq);
>         dev->pin = irq;
>         if (irq)
>                 pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &irq);
>         dev->irq = irq;
> }
> 
> Thus a device without an interrupt pin will get irq set to 0,
> and this seems the right way to detect such devices.
> I don't think PCI devices really use IRQ0 in practice,
> its probably used for PC things. More likely the system is
> misconfigured.  Try lspci -vv to see what went wrong.

Yes, the PCI device shoulde not use IRQ0. I debug qemu's code, and find the
PCI_INTERRUPT_LINE register is not set by qemu:
=============
Hardware watchpoint 6: ((uint8_t *) 0x164e410)[0x3c]

Old value = 0 '\000'
New value = 10 '\n'
pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at /home/wency/source/qemu/hw/pci.c:1115
1115	        d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.23-8.el6.x86_64 cyrus-sasl-md5-2.1.23-8.el6.x86_64 cyrus-sasl-plain-2.1.23-8.el6.x86_64 db4-4.7.25-16.el6.x86_64
(gdb) bt
#0  pci_default_write_config (d=0x1653ed0, addr=60, val=10, l=1) at /home/wency/source/qemu/hw/pci.c:1115
#1  0x00000000004d5827 in pci_host_config_write_common (pci_dev=0x1653ed0, addr=60, limit=256, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:54
#2  0x00000000004d5939 in pci_data_write (s=0x15f95a0, addr=2147502140, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:75
#3  0x00000000004d5b19 in pci_host_data_write (handler=0x15f9570, addr=3324, val=10, len=1) at /home/wency/source/qemu/hw/pci_host.c:125
#4  0x000000000063ee06 in ioport_simple_writeb (opaque=0x15f9570, addr=3324, value=10) at /home/wency/source/qemu/rwhandler.c:48
#5  0x0000000000470db9 in ioport_write (index=0, address=3324, data=10) at ioport.c:81
#6  0x00000000004717bc in cpu_outb (addr=3324, val=10 '\n') at ioport.c:273
#7  0x00000000005ef25d in kvm_handle_io (port=3324, data=0x7ffff7ff8000, direction=1, size=1, count=1) at /home/wency/source/qemu/kvm-all.c:834
#8  0x00000000005ef7e6 in kvm_cpu_exec (env=0x13da0d0) at /home/wency/source/qemu/kvm-all.c:976
#9  0x00000000005c1a7b in qemu_kvm_cpu_thread_fn (arg=0x13da0d0) at /home/wency/source/qemu/cpus.c:661
#10 0x00000032864077e1 in start_thread () from /lib64/libpthread.so.0
#11 0x00000032858e68ed in clone () from /lib64/libc.so.6
=============

If I put ivshmem on bus 0, the PCI_INTERRUPT_LINE register can be set. So I guess this register is set by bios.
I use the newest seabios, and PCI_INTERRUPT_LINE register is not set if the deivce is not on bus0.

# lspci -vv
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
	Subsystem: Red Hat, Inc Qemu virtual machine
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
	Subsystem: Red Hat, Inc Qemu virtual machine
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] (prog-if 80 [Master])
	Subsystem: Red Hat, Inc Qemu virtual machine
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
	Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
	Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
	Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1]
	Region 4: I/O ports at d100 [size=16]
	Kernel driver in use: ata_piix
	Kernel modules: ata_piix

00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
	Subsystem: Red Hat, Inc Qemu virtual machine
	Physical Slot: 1
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 9
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c-piix4

00:02.0 VGA compatible controller: Cirrus Logic GD 5446 (prog-if 00 [VGA controller])
	Subsystem: Red Hat, Inc Device 1100
	Physical Slot: 2
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
	Region 1: Memory at f8020000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at f8000000 [disabled] [size=64K]
	Kernel modules: cirrusfb

00:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
	Subsystem: Red Hat, Inc Device 1100
	Physical Slot: 3
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 11
	Region 0: I/O ports at d000 [size=256]
	Region 1: Memory at f8021000 (32-bit, non-prefetchable) [size=256]
	Expansion ROM at f8010000 [disabled] [size=64K]
	Kernel driver in use: 8139cp
	Kernel modules: 8139too, 8139cp

00:08.0 PCI bridge: Red Hat, Inc. Device 0001 (prog-if 00 [Normal decode])
	Physical Slot: 8
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000c000-0000cfff
	Memory behind bridge: f0000000-f7ffffff
	Prefetchable memory behind bridge: fe000000-fe0fffff
	Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

01:01.0 RAM memory: Red Hat, Inc Device 1110
	Subsystem: Red Hat, Inc Device 1100
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at f4000000 (32-bit, non-prefetchable) [disabled] [size=256]
	Region 2: Memory at f0000000 (32-bit, non-prefetchable) [disabled] [size=32M]
	Kernel modules: virtio_pci

01:01.1 RAM memory: Red Hat, Inc Device 1110
	Subsystem: Red Hat, Inc Device 1100
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at f4001000 (32-bit, non-prefetchable) [disabled] [size=256]
	Region 2: Memory at f2000000 (32-bit, non-prefetchable) [disabled] [size=32M]
	Kernel modules: virtio_pci


>
Wen Congyang - Sept. 9, 2011, 6:43 a.m.
At 08/19/2011 11:26 PM, Avi Kivity Write:
> On 08/18/2011 10:12 PM, Wen Congyang wrote:
>> >>
>> >>  The following patch can fix this problem, but I'm not sure whether it
>> >>  is right.
>> >
>> >  It's correct but insufficient, the filtering code (pci_bridge_filter)
>> >  needs to be updated to use the memory API.
>>
>> I read the function pci_bridge_filter(), and the function only read
>> PCI bridge's config space(command, base and limit). If base>  limit,
>> it will set addr to PCI_BAR_UNMAPPED.
>>
>> I do not find anything that needs to updated to use the memory API.
> 
> Currently it doesn't do any filtering at all.  Bridges need to create a
> new address space, then attach aliases of this region (corresponding to
> the filtered area and to the legacy vga space) to the parent bus'
> address space.
> 
>> I add a scsi controller on pci bus1, and a scsi disk on this controller.
>> I can read and write this disk, and I do not meet any problem.
>>
> 
> However, filtering doesn't work.  You could put a BAR outside the
> filtered area and it would be visible to the guest.
> 

I test it on real hardware. If I put a BAR outside the filterer area, and
then run 'lspci -vv', the BAR does not change:
# lspci -vv
...
00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 01) (prog-if 00 [Normal decode])
...
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff
	Memory behind bridge: fea00000-feafffff
	Prefetchable memory behind bridge: 0000000080400000-00000000805fffff
...
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode])
...
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=32
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: feb00000-febfffff
	Prefetchable memory behind bridge: 0000000080600000-00000000806fffff
...
03:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
...
	Region 0: I/O ports at a800 [size=256]
	Region 1: Memory at febfbc00 (32-bit, non-prefetchable) [size=256]
	Expansion ROM at 80600000 [disabled] [size=128K]
...

# od -t x1 /sys/bus/pci/devices/0000\:03\:01.0/config
0000000 ec 10 39 81 03 00 90 82 10 00 00 02 00 00 00 00
0000020 01 a8 00 00 00 bc af fe 00 00 00 00 00 00 00 00
0000040 00 00 00 00 00 00 00 00 00 00 00 00 ec 10 39 81
0000060 00 00 bc fe 50 00 00 00 00 00 00 00 06 01 20 40
0000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000120 01 00 02 76 00 00 00 00 00 00 00 00 00 00 00 00
0000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000400
The BAR1 is feafbc00, and it is in the bus2's range.
I map the BAR(mmap /sys/bus/pci/devices/0000\:03\:01.0/resource1), and find
I can read and write the memory.

Thanks
Wen Congyang
Michael S. Tsirkin - Sept. 9, 2011, 7:12 a.m.
On Fri, Sep 09, 2011 at 02:43:24PM +0800, Wen Congyang wrote:
> > However, filtering doesn't work.  You could put a BAR outside the
> > filtered area and it would be visible to the guest.
> > 
> 
> I test it on real hardware. If I put a BAR outside the filterer area, and
> then run 'lspci -vv', the BAR does not change:

...


> The BAR1 is feafbc00, and it is in the bus2's range.
> I map the BAR(mmap /sys/bus/pci/devices/0000\:03\:01.0/resource1), and find
> I can read and write the memory.
> 
> Thanks
> Wen Congyang

So, it's as expected. Nothing seems wrong with this picture. But
this is not the test that Avi suggested.
Wen Congyang - Sept. 9, 2011, 7:24 a.m.
At 09/09/2011 03:12 PM, Michael S. Tsirkin Write:
> On Fri, Sep 09, 2011 at 02:43:24PM +0800, Wen Congyang wrote:
>>> However, filtering doesn't work.  You could put a BAR outside the
>>> filtered area and it would be visible to the guest.
>>>
>>
>> I test it on real hardware. If I put a BAR outside the filterer area, and
>> then run 'lspci -vv', the BAR does not change:
> 
> ...
> 
> 
>> The BAR1 is feafbc00, and it is in the bus2's range.
>> I map the BAR(mmap /sys/bus/pci/devices/0000\:03\:01.0/resource1), and find
>> I can read and write the memory.
>>
>> Thanks
>> Wen Congyang
> 
> So, it's as expected. Nothing seems wrong with this picture. But
> this is not the test that Avi suggested.

Sorry for my misunderstand.
My question is: How to put a BAR outside the filterer area, and how to know
whether it is visible?

Thanks
Wen Congyang
Michael S. Tsirkin - Sept. 9, 2011, 7:34 a.m.
On Fri, Sep 09, 2011 at 03:24:54PM +0800, Wen Congyang wrote:
> At 09/09/2011 03:12 PM, Michael S. Tsirkin Write:
> > On Fri, Sep 09, 2011 at 02:43:24PM +0800, Wen Congyang wrote:
> >>> However, filtering doesn't work.  You could put a BAR outside the
> >>> filtered area and it would be visible to the guest.
> >>>
> >>
> >> I test it on real hardware. If I put a BAR outside the filterer area, and
> >> then run 'lspci -vv', the BAR does not change:
> > 
> > ...
> > 
> > 
> >> The BAR1 is feafbc00, and it is in the bus2's range.
> >> I map the BAR(mmap /sys/bus/pci/devices/0000\:03\:01.0/resource1), and find
> >> I can read and write the memory.
> >>
> >> Thanks
> >> Wen Congyang
> > 
> > So, it's as expected. Nothing seems wrong with this picture. But
> > this is not the test that Avi suggested.
> 
> Sorry for my misunderstand.
> My question is: How to put a BAR outside the filterer area,

Write into address/limit registers on the bridge to make them
not cover the BAR behind.

> and how to know
> whether it is visible?
> 
> Thanks
> Wen Congyang

Read the BAR memory as you did.
Wen Congyang - Sept. 9, 2011, 7:35 a.m.
At 09/09/2011 03:34 PM, Michael S. Tsirkin Write:
> On Fri, Sep 09, 2011 at 03:24:54PM +0800, Wen Congyang wrote:
>> At 09/09/2011 03:12 PM, Michael S. Tsirkin Write:
>>> On Fri, Sep 09, 2011 at 02:43:24PM +0800, Wen Congyang wrote:
>>>>> However, filtering doesn't work.  You could put a BAR outside the
>>>>> filtered area and it would be visible to the guest.
>>>>>
>>>>
>>>> I test it on real hardware. If I put a BAR outside the filterer area, and
>>>> then run 'lspci -vv', the BAR does not change:
>>>
>>> ...
>>>
>>>
>>>> The BAR1 is feafbc00, and it is in the bus2's range.
>>>> I map the BAR(mmap /sys/bus/pci/devices/0000\:03\:01.0/resource1), and find
>>>> I can read and write the memory.
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>
>>> So, it's as expected. Nothing seems wrong with this picture. But
>>> this is not the test that Avi suggested.
>>
>> Sorry for my misunderstand.
>> My question is: How to put a BAR outside the filterer area,
> 
> Write into address/limit registers on the bridge to make them
> not cover the BAR behind.
> 
>> and how to know
>> whether it is visible?
>>
>> Thanks
>> Wen Congyang
> 
> Read the BAR memory as you did.


Thanks for your explain.
Wen Congyang

Patch

diff --git a/hw/pci_bridge.c b/hw/pci_bridge.c
index 464d897..df16faa 100644
--- a/hw/pci_bridge.c
+++ b/hw/pci_bridge.c
@@ -246,6 +246,8 @@  int pci_bridge_initfn(PCIDevice *dev)
                         br->bus_name);
     sec_bus->parent_dev = dev;
     sec_bus->map_irq = br->map_irq;
+    sec_bus->address_space_mem = parent->address_space_mem;
+    sec_bus->address_space_io = parent->address_space_io;
 
     QLIST_INIT(&sec_bus->child);
     QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling);