Patchwork [3/3] VFIO: Direct access config reg without capability

login
register
mail settings
Submitter Gavin Shan
Date March 15, 2013, 7:26 a.m.
Message ID <1363332390-12754-4-git-send-email-shangw@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/227862/
State Changes Requested
Headers show

Comments

Gavin Shan - March 15, 2013, 7:26 a.m.
The config registers in [0, 0x40] is being supported by VFIO. Apart
from that, the other config registers should be coverred by PCI or
PCIe capability. However, there might have some PCI devices (be2net)
who has config registers (0x7c) out of [0, 0x40], and don't have
corresponding PCI or PCIe capability. VFIO will return 0x0 on reading
those registers and writing is dropped. It caused the be2net driver
fails to be loaded because 0x0 returned from its config register 0x7c.

The patch changes the behaviour so that those config registers out
of [0, 0x40] and don't have corresponding PCI or PCIe capability
will be accessed directly.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/vfio/pci/vfio_pci_config.c |   31 ++++++++++++++++++++-----------
 1 files changed, 20 insertions(+), 11 deletions(-)
Alex Williamson - March 15, 2013, 7:41 p.m.
On Fri, 2013-03-15 at 15:26 +0800, Gavin Shan wrote:
> The config registers in [0, 0x40] is being supported by VFIO. Apart
> from that, the other config registers should be coverred by PCI or
> PCIe capability. However, there might have some PCI devices (be2net)
> who has config registers (0x7c) out of [0, 0x40], and don't have
> corresponding PCI or PCIe capability. VFIO will return 0x0 on reading
> those registers and writing is dropped. It caused the be2net driver
> fails to be loaded because 0x0 returned from its config register 0x7c.
> 
> The patch changes the behaviour so that those config registers out
> of [0, 0x40] and don't have corresponding PCI or PCIe capability
> will be accessed directly.

This basically gives userspace free access to any regions that aren't
covered by known capabilities.  We have no idea what this might expose
on some devices.  I'd like to support be2net, but what's the minimal
access that it needs?  Can we provide 2 or 4 bytes of read-only access
at offset 0x7c for just that device?  Is it always 0x7c?  Let's split
this patch from the series since it's clearly dealing with something
independent.  Thanks,

Alex

> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci_config.c |   31 ++++++++++++++++++++-----------
>  1 files changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 964ff22..5ea3afb 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -1471,18 +1471,27 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
>  
>  	cap_id = vdev->pci_config_map[*ppos / 4];
>  
> +	/*
> +	 * Some PCI device config registers might not be coverred by
> +	 * capability and useful. We will enable direct access to
> +	 * those registers.
> +	 */
>  	if (cap_id == PCI_CAP_ID_INVALID) {
> -		if (iswrite)
> -			return ret; /* drop */
> -
> -		/*
> -		 * Per PCI spec 3.0, section 6.1, reads from reserved and
> -		 * unimplemented registers return 0
> -		 */
> -		if (copy_to_user(buf, &val, count))
> -			return -EFAULT;
> -
> -		return ret;
> +		if (iswrite) {
> +			if (copy_from_user(&val, buf, count))
> +				return -EFAULT;
> +			ret = vfio_user_config_write(vdev->pdev, (int)(*ppos),
> +						     val, count);
> +			return ret ? ret : count;
> +		} else {
> +			ret = vfio_user_config_read(vdev->pdev, (int)(*ppos),
> +						    &val, count);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &val, count))
> +				return -EFAULT;
> +			return count;
> +		}
>  	}
>  
>  	/*
Gavin Shan - March 16, 2013, 3:34 a.m.
On Fri, Mar 15, 2013 at 01:41:08PM -0600, Alex Williamson wrote:
>On Fri, 2013-03-15 at 15:26 +0800, Gavin Shan wrote:
>> The config registers in [0, 0x40] is being supported by VFIO. Apart
>> from that, the other config registers should be coverred by PCI or
>> PCIe capability. However, there might have some PCI devices (be2net)
>> who has config registers (0x7c) out of [0, 0x40], and don't have
>> corresponding PCI or PCIe capability. VFIO will return 0x0 on reading
>> those registers and writing is dropped. It caused the be2net driver
>> fails to be loaded because 0x0 returned from its config register 0x7c.
>> 
>> The patch changes the behaviour so that those config registers out
>> of [0, 0x40] and don't have corresponding PCI or PCIe capability
>> will be accessed directly.
>
>This basically gives userspace free access to any regions that aren't
>covered by known capabilities.  We have no idea what this might expose
>on some devices.  I'd like to support be2net, but what's the minimal
>access that it needs?  Can we provide 2 or 4 bytes of read-only access
>at offset 0x7c for just that device?  Is it always 0x7c?  Let's split
>this patch from the series since it's clearly dealing with something
>independent.  Thanks,
>

0x7c is just one example. Actually, benet driver also need access other
uncoverred config registers like 0x58/0xf0/0xfc (by capabilities) in orde
to make the device work well. All of those uncoverred config registers
are really business of specific device itself. I think we might not bother
their accessing attributes. So exporting those uncoverred registers to
user space might be the reasonable choice.

If we really want to control the accessing attributes for those uncoverred
registers, we might introduce some mechanism to check the vendor/device ID
and read/write to the uncoverred registers according the specified bits.
All of that requires fully understanding the usage of those uncoverred registers.

Yes, I will split this one from the patchset.

Thanks,
Gavin

>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  drivers/vfio/pci/vfio_pci_config.c |   31 ++++++++++++++++++++-----------
>>  1 files changed, 20 insertions(+), 11 deletions(-)
>> 
>> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
>> index 964ff22..5ea3afb 100644
>> --- a/drivers/vfio/pci/vfio_pci_config.c
>> +++ b/drivers/vfio/pci/vfio_pci_config.c
>> @@ -1471,18 +1471,27 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
>>  
>>  	cap_id = vdev->pci_config_map[*ppos / 4];
>>  
>> +	/*
>> +	 * Some PCI device config registers might not be coverred by
>> +	 * capability and useful. We will enable direct access to
>> +	 * those registers.
>> +	 */
>>  	if (cap_id == PCI_CAP_ID_INVALID) {
>> -		if (iswrite)
>> -			return ret; /* drop */
>> -
>> -		/*
>> -		 * Per PCI spec 3.0, section 6.1, reads from reserved and
>> -		 * unimplemented registers return 0
>> -		 */
>> -		if (copy_to_user(buf, &val, count))
>> -			return -EFAULT;
>> -
>> -		return ret;
>> +		if (iswrite) {
>> +			if (copy_from_user(&val, buf, count))
>> +				return -EFAULT;
>> +			ret = vfio_user_config_write(vdev->pdev, (int)(*ppos),
>> +						     val, count);
>> +			return ret ? ret : count;
>> +		} else {
>> +			ret = vfio_user_config_read(vdev->pdev, (int)(*ppos),
>> +						    &val, count);
>> +			if (ret)
>> +				return ret;
>> +			if (copy_to_user(buf, &val, count))
>> +				return -EFAULT;
>> +			return count;
>> +		}
>>  	}
>>  
>>  	/*
>
>
>
Benjamin Herrenschmidt - March 16, 2013, 5:30 a.m.
On Fri, 2013-03-15 at 13:41 -0600, Alex Williamson wrote:
> 
> This basically gives userspace free access to any regions that aren't
> covered by known capabilities. 

And ?

I mean seriously :-) We already had that discussion ... trying to
"protect" config space is just plain doomed. There is no point.

It makes sense to do things like emulate BARs etc... for things to
function properly under some circumstances/setups where you can't just
expose the original BAR values to the guest and have the HW take care of
it but you *must* be prepared to deal with anything in config space
being changed without you knowing about it.

Devices *will* have backdoors via MMIO. Period. You cannot rely on those
not existing, whether they are documented or not.

If you can't cope with the config space accesses then you aren't
properly isolated. It can be deemed acceptable (depends what you use
your VMs for) but that I mean is that any config space
filtering/emulation for the sake of "security" is ... pointless.

Doing it for functionality to work at all (ie BAR emulation) is fine,
but that's about it. IE. As a mean of security it's pointless.


>  We have no idea what this might expose on some devices.

No more than we have any idea what MMIO mapping of the device register
space exposes :-)

>   I'd like to support be2net, but what's the minimal
> access that it needs?  Can we provide 2 or 4 bytes of read-only access
> at offset 0x7c for just that device?  Is it always 0x7c?  Let's split
> this patch from the series since it's clearly dealing with something
> independent.  Thanks,

Ben.
Alex Williamson - March 18, 2013, 9:15 p.m.
On Sat, 2013-03-16 at 06:30 +0100, Benjamin Herrenschmidt wrote:
> On Fri, 2013-03-15 at 13:41 -0600, Alex Williamson wrote:
> > 
> > This basically gives userspace free access to any regions that aren't
> > covered by known capabilities. 
> 
> And ?
> 
> I mean seriously :-) We already had that discussion ... trying to
> "protect" config space is just plain doomed. There is no point.
> 
> It makes sense to do things like emulate BARs etc... for things to
> function properly under some circumstances/setups where you can't just
> expose the original BAR values to the guest and have the HW take care of
> it but you *must* be prepared to deal with anything in config space
> being changed without you knowing about it.
> 
> Devices *will* have backdoors via MMIO. Period. You cannot rely on those
> not existing, whether they are documented or not.
> 
> If you can't cope with the config space accesses then you aren't
> properly isolated. It can be deemed acceptable (depends what you use
> your VMs for) but that I mean is that any config space
> filtering/emulation for the sake of "security" is ... pointless.
> 
> Doing it for functionality to work at all (ie BAR emulation) is fine,
> but that's about it. IE. As a mean of security it's pointless.
> 
> 
> >  We have no idea what this might expose on some devices.
> 
> No more than we have any idea what MMIO mapping of the device register
> space exposes :-)

Yeah, yeah.  Ok, I can't come up with a reasonable argument otherwise,
it'll give us better device support, and I believe pci-assign has always
done this.  I'll take another look at the patch.  Thanks,

Alex
Alex Williamson - March 21, 2013, 12:58 a.m.
On Fri, 2013-03-15 at 15:26 +0800, Gavin Shan wrote:
> The config registers in [0, 0x40] is being supported by VFIO. Apart
> from that, the other config registers should be coverred by PCI or
> PCIe capability. However, there might have some PCI devices (be2net)
> who has config registers (0x7c) out of [0, 0x40], and don't have
> corresponding PCI or PCIe capability. VFIO will return 0x0 on reading
> those registers and writing is dropped. It caused the be2net driver
> fails to be loaded because 0x0 returned from its config register 0x7c.
> 
> The patch changes the behaviour so that those config registers out
> of [0, 0x40] and don't have corresponding PCI or PCIe capability
> will be accessed directly.
> 
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---

Hi Gavin,

I'm onboard with making this change now, but this patch isn't
sufficient.  The config space map uses a byte per dword to index the
capability since both standard and extended capabilities are dword
aligned.  We currently have a bug that this patch exposes that we round
the length down, ex. a 14 byte MSI capability becomes 12 bytes leaving
the message data now exposed and writable with this patch.  That bug can
be fixed by aligning the length so the capability fills the dword, but
notice that 0x7c on the be2net is filling one of these gaps.  So fixing
that bug attaches that gap to the previous capability instead of
allowing direct access.

So, before we can make this change we need to fix the config map to have
byte granularity.  Thanks,

Alex

>  drivers/vfio/pci/vfio_pci_config.c |   31 ++++++++++++++++++++-----------
>  1 files changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 964ff22..5ea3afb 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -1471,18 +1471,27 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
>  
>  	cap_id = vdev->pci_config_map[*ppos / 4];
>  
> +	/*
> +	 * Some PCI device config registers might not be coverred by
> +	 * capability and useful. We will enable direct access to
> +	 * those registers.
> +	 */
>  	if (cap_id == PCI_CAP_ID_INVALID) {
> -		if (iswrite)
> -			return ret; /* drop */
> -
> -		/*
> -		 * Per PCI spec 3.0, section 6.1, reads from reserved and
> -		 * unimplemented registers return 0
> -		 */
> -		if (copy_to_user(buf, &val, count))
> -			return -EFAULT;
> -
> -		return ret;
> +		if (iswrite) {
> +			if (copy_from_user(&val, buf, count))
> +				return -EFAULT;
> +			ret = vfio_user_config_write(vdev->pdev, (int)(*ppos),
> +						     val, count);
> +			return ret ? ret : count;
> +		} else {
> +			ret = vfio_user_config_read(vdev->pdev, (int)(*ppos),
> +						    &val, count);
> +			if (ret)
> +				return ret;
> +			if (copy_to_user(buf, &val, count))
> +				return -EFAULT;
> +			return count;
> +		}
>  	}
>  
>  	/*

Patch

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 964ff22..5ea3afb 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1471,18 +1471,27 @@  static ssize_t vfio_config_do_rw(struct vfio_pci_device *vdev, char __user *buf,
 
 	cap_id = vdev->pci_config_map[*ppos / 4];
 
+	/*
+	 * Some PCI device config registers might not be coverred by
+	 * capability and useful. We will enable direct access to
+	 * those registers.
+	 */
 	if (cap_id == PCI_CAP_ID_INVALID) {
-		if (iswrite)
-			return ret; /* drop */
-
-		/*
-		 * Per PCI spec 3.0, section 6.1, reads from reserved and
-		 * unimplemented registers return 0
-		 */
-		if (copy_to_user(buf, &val, count))
-			return -EFAULT;
-
-		return ret;
+		if (iswrite) {
+			if (copy_from_user(&val, buf, count))
+				return -EFAULT;
+			ret = vfio_user_config_write(vdev->pdev, (int)(*ppos),
+						     val, count);
+			return ret ? ret : count;
+		} else {
+			ret = vfio_user_config_read(vdev->pdev, (int)(*ppos),
+						    &val, count);
+			if (ret)
+				return ret;
+			if (copy_to_user(buf, &val, count))
+				return -EFAULT;
+			return count;
+		}
 	}
 
 	/*