diff mbox series

[v5,16/16] x86/tdx: Add cmdline option to force use of ioremap_host_shared

Message ID 20211009003711.1390019-17-sathyanarayanan.kuppuswamy@linux.intel.com
State New
Headers show
Series Add TDX Guest Support (shared-mm support) | expand

Commit Message

Kuppuswamy Sathyanarayanan Oct. 9, 2021, 12:37 a.m. UTC
Add a command line option to force all the enabled drivers to use
shared memory mappings. This will be useful when enabling new drivers
in the confidential guest without making all the required changes to
use shared mappings in it.

Note that this might also allow other non explicitly enabled drivers
to interact with the host, which could cause other security risks.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 .../admin-guide/kernel-parameters.rst         |  1 +
 .../admin-guide/kernel-parameters.txt         | 12 ++++++++++++
 arch/x86/include/asm/io.h                     |  2 ++
 arch/x86/mm/ioremap.c                         | 19 ++++++++++++++++++-
 4 files changed, 33 insertions(+), 1 deletion(-)

Comments

Randy Dunlap Oct. 9, 2021, 1:45 a.m. UTC | #1
On 10/8/21 5:37 PM, Kuppuswamy Sathyanarayanan wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 91ba391f9b32..0af19cb1a28c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2076,6 +2076,18 @@
>   			1 - Bypass the IOMMU for DMA.
>   			unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
>   
> +	ioremap_force_shared= [X86_64, CCG]
> +			Force the kernel to use shared memory mappings which do
> +			not use ioremap_host_shared/pcimap_host_shared to opt-in
> +			to shared mappings with the host. This feature is mainly
> +			used by a confidential guest when enabling new drivers
> +			without proper shared memory related changes. Please note
> +			that this option might also allow other non explicitly
> +			enabled drivers to interact with the host in confidential
> +			guest, which could cause other security risks. This option
> +			will also cause BIOS data structures to be shared with the
> +			host, which might open security holes.

Hi,
This cmdline option text should have a little bit more info. Just as an
example/template:

	acpi_apic_instance=	[ACPI, IOAPIC]
			Format: <int>
			2: use 2nd APIC table, if available
			1,0: use 1st APIC table
			default: 0

So what is expected after the "=" sign?...

thanks.
Kuppuswamy Sathyanarayanan Oct. 9, 2021, 2:10 a.m. UTC | #2
On 10/8/21 6:45 PM, Randy Dunlap wrote:
> Hi,
> This cmdline option text should have a little bit more info. Just as an
> example/template:
> 
>      acpi_apic_instance=    [ACPI, IOAPIC]
>              Format: <int>
>              2: use 2nd APIC table, if available
>              1,0: use 1st APIC table
>              default: 0
> 
> So what is expected after the "=" sign?...

It does not take any arguments. I will remove the = sign in next version.
Michael S. Tsirkin Oct. 9, 2021, 11:04 a.m. UTC | #3
On Fri, Oct 08, 2021 at 05:37:11PM -0700, Kuppuswamy Sathyanarayanan wrote:
> +	ioremap_force_shared= [X86_64, CCG]
> +			Force the kernel to use shared memory mappings which do
> +			not use ioremap_host_shared/pcimap_host_shared to opt-in
> +			to shared mappings with the host. This feature is mainly
> +			used by a confidential guest when enabling new drivers
> +			without proper shared memory related changes. Please note
> +			that this option might also allow other non explicitly
> +			enabled drivers to interact with the host in confidential
> +			guest, which could cause other security risks. This option
> +			will also cause BIOS data structures to be shared with the
> +			host, which might open security holes.
> +
>  	io7=		[HW] IO7 for Marvel-based Alpha systems
>  			See comment before marvel_specify_io7 in
>  			arch/alpha/kernel/core_marvel.c.

The connection is quite unfortunate IMHO.
Can't there be an option
that unbreaks drivers *without* opening up security holes by
making BIOS shared?
Andi Kleen Oct. 11, 2021, 2:39 a.m. UTC | #4
> The connection is quite unfortunate IMHO.
> Can't there be an option
> that unbreaks drivers *without* opening up security holes by
> making BIOS shared?

That would require new low level APIs that distinguish both cases, and a 
tree sweep.


-Andi
Michael S. Tsirkin Oct. 11, 2021, 12:04 p.m. UTC | #5
On Sun, Oct 10, 2021 at 07:39:55PM -0700, Andi Kleen wrote:
> 
> > The connection is quite unfortunate IMHO.
> > Can't there be an option
> > that unbreaks drivers *without* opening up security holes by
> > making BIOS shared?
> 
> That would require new low level APIs that distinguish both cases, and a
> tree sweep.
> 
> 
> -Andi

Presumably bios code is in arch/x86 and drivers/acpi, right?
Up to 200 calls the majority of which is likely private ...

I don't have better ideas but the current setup will just
result in people making their guests vulnerable whenever they
want to allow device pass-through.
Andi Kleen Oct. 11, 2021, 5:35 p.m. UTC | #6
> Presumably bios code is in arch/x86 and drivers/acpi, right?
> Up to 200 calls the majority of which is likely private ...

Yes.

> I don't have better ideas but the current setup will just
> result in people making their guests vulnerable whenever they
> want to allow device pass-through.


Yes that's true. For current TDX our target is virtual devices only. But 
if pass through usage will be really wide spread we may need to revisit.


-Andi
Michael S. Tsirkin Oct. 11, 2021, 6:28 p.m. UTC | #7
On Mon, Oct 11, 2021 at 10:35:18AM -0700, Andi Kleen wrote:
> 
> > Presumably bios code is in arch/x86 and drivers/acpi, right?
> > Up to 200 calls the majority of which is likely private ...
> 
> Yes.
> 
> > I don't have better ideas but the current setup will just
> > result in people making their guests vulnerable whenever they
> > want to allow device pass-through.
> 
> 
> Yes that's true. For current TDX our target is virtual devices only. But if
> pass through usage will be really wide spread we may need to revisit.
> 
> 
> -Andi

I mean ... it's already wide spread. If we support it with TDX
it will be used with TDX. If we don't then I guess it won't,
exposing this kind of limitation in a userspace visible way isn't great
though. I guess it boils down to the fact that
ioremap_host_shared is just not a great interface, users simply
have no idea whether a given driver uses ioremap.
Andi Kleen Oct. 12, 2021, 5:55 p.m. UTC | #8
> I mean ... it's already wide spread.


I meant wide spread usage with confidential guests.

> If we support it with TDX
> it will be used with TDX.

It has some security trade offs. The main reason to use TDX is security. 
Also when people take the VT-d tradeoffs they might be ok with the BIOS 
trade offs too.

-Andi
Michael S. Tsirkin Oct. 12, 2021, 8:59 p.m. UTC | #9
On Tue, Oct 12, 2021 at 10:55:20AM -0700, Andi Kleen wrote:
> 
> > I mean ... it's already wide spread.
> 
> 
> I meant wide spread usage with confidential guests.
> 
> > If we support it with TDX
> > it will be used with TDX.
> 
> It has some security trade offs. The main reason to use TDX is security.
> Also when people take the VT-d tradeoffs they might be ok with the BIOS
> trade offs too.
> 
> -Andi

Interesting. VT-d tradeoffs ... what are they?
Allowing hypervisor to write into BIOS looks like it will
trivially lead to code execution, won't it?
Andi Kleen Oct. 12, 2021, 9:18 p.m. UTC | #10
> Interesting. VT-d tradeoffs ... what are they?

The connection to the device is not encrypted and also not authenticated.

This is different that even talking to the (untrusted) host through 
shared memory where you at least still have a common key.

> Allowing hypervisor to write into BIOS looks like it will
> trivially lead to code execution, won't it?

This is not about BIOS code executing. While the guest firmware runs it 
is protected of course. This is for BIOS structures like ACPI tables 
that are mapped by Linux. While AML can run byte code it can normally 
not write to arbitrary memory.

The risk is more that all the Linux code dealing with this hasn't been 
hardened to deal with malicious input.

-Andi
Michael S. Tsirkin Oct. 12, 2021, 9:30 p.m. UTC | #11
On Tue, Oct 12, 2021 at 02:18:01PM -0700, Andi Kleen wrote:
> 
> > Interesting. VT-d tradeoffs ... what are they?
> 
> The connection to the device is not encrypted and also not authenticated.
> 
> This is different that even talking to the (untrusted) host through shared
> memory where you at least still have a common key.

Well it's different sure enough but how is talking to host less secure?
Cold boot attacks and such?

> > Allowing hypervisor to write into BIOS looks like it will
> > trivially lead to code execution, won't it?
> 
> This is not about BIOS code executing. While the guest firmware runs it is
> protected of course. This is for BIOS structures like ACPI tables that are
> mapped by Linux. While AML can run byte code it can normally not write to
> arbitrary memory.

I thought you basically create an OperationRegion of SystemMemory type,
and off you go. Maybe the OSPM in Linux is clever and protects
some memory, I wouldn't know.

> The risk is more that all the Linux code dealing with this hasn't been
> hardened to deal with malicious input.
> 
> -Andi
Andi Kleen Oct. 15, 2021, 5:50 a.m. UTC | #12
> I thought you basically create an OperationRegion of SystemMemory type,
> and off you go. Maybe the OSPM in Linux is clever and protects
> some memory, I wouldn't know.


I investigated this now, and it looks like acpi is using 
ioremap_cache(). We can hook into that and force non sharing. It's 
probably safe to assume that this is not used on real IO devices.

I think there are still some other BIOS mappings that use just plain 
ioremap() though.


-Andi
Michael S. Tsirkin Oct. 15, 2021, 6:57 a.m. UTC | #13
On Thu, Oct 14, 2021 at 10:50:59PM -0700, Andi Kleen wrote:
> 
> > I thought you basically create an OperationRegion of SystemMemory type,
> > and off you go. Maybe the OSPM in Linux is clever and protects
> > some memory, I wouldn't know.
> 
> 
> I investigated this now, and it looks like acpi is using ioremap_cache(). We
> can hook into that and force non sharing. It's probably safe to assume that
> this is not used on real IO devices.
> 
> I think there are still some other BIOS mappings that use just plain
> ioremap() though.
> 
> 
> -Andi

Hmm don't you mean the reverse? If you make ioremap shared then OS is
protected from malicious ACPI? If you don't make it shared then
malicious ACPI can poke at arbitrary OS memory.  Looks like making
ioremap non shared by default is actually less safe than shared.
Interesting.

For BIOS I suspect there's no way around it, it needs to be
audited since it's executable.
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 01ba293a2d70..02e6aae1ad68 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -102,6 +102,7 @@  parameter is applicable::
 	ARM	ARM architecture is enabled.
 	ARM64	ARM64 architecture is enabled.
 	AX25	Appropriate AX.25 support is enabled.
+	CCG	Confidential Computing guest is enabled.
 	CLK	Common clock infrastructure is enabled.
 	CMA	Contiguous Memory Area support is enabled.
 	DRM	Direct Rendering Management support is enabled.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 91ba391f9b32..0af19cb1a28c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2076,6 +2076,18 @@ 
 			1 - Bypass the IOMMU for DMA.
 			unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
+	ioremap_force_shared= [X86_64, CCG]
+			Force the kernel to use shared memory mappings which do
+			not use ioremap_host_shared/pcimap_host_shared to opt-in
+			to shared mappings with the host. This feature is mainly
+			used by a confidential guest when enabling new drivers
+			without proper shared memory related changes. Please note
+			that this option might also allow other non explicitly
+			enabled drivers to interact with the host in confidential
+			guest, which could cause other security risks. This option
+			will also cause BIOS data structures to be shared with the
+			host, which might open security holes.
+
 	io7=		[HW] IO7 for Marvel-based Alpha systems
 			See comment before marvel_specify_io7 in
 			arch/alpha/kernel/core_marvel.c.
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 521b239c013f..98836c2833e4 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -423,6 +423,8 @@  static inline bool phys_mem_access_encrypted(unsigned long phys_addr,
 }
 #endif
 
+extern bool ioremap_force_shared;
+
 /**
  * iosubmit_cmds512 - copy data to single MMIO location, in 512-bit units
  * @dst: destination, in MMIO space (must be 512-bit aligned)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index a83a69045f61..d0d2bf5116bc 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -28,6 +28,7 @@ 
 #include <asm/memtype.h>
 #include <asm/setup.h>
 #include <asm/tdx.h>
+#include <asm/cmdline.h>
 
 #include "physaddr.h"
 
@@ -162,6 +163,17 @@  static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
 	__ioremap_check_other(addr, desc);
 }
 
+/*
+ * Normally only drivers that are hardened for use in confidential guests
+ * force shared mappings. But if device filtering is disabled other
+ * devices can be loaded, and these need shared mappings too. This
+ * variable is set to true if these filters are disabled.
+ *
+ * Note this has some side effects, e.g. various BIOS tables
+ * get shared too which is risky.
+ */
+bool ioremap_force_shared;
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. It transparently creates kernel huge I/O mapping when
@@ -249,7 +261,7 @@  __ioremap_caller(resource_size_t phys_addr, unsigned long size,
 	prot = PAGE_KERNEL_IO;
 	if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
 		prot = pgprot_encrypted(prot);
-	else if (shared)
+	else if (shared || ioremap_force_shared)
 		prot = pgprot_cc_guest(prot);
 
 	switch (pcm) {
@@ -847,6 +859,11 @@  void __init early_ioremap_init(void)
 	WARN_ON((fix_to_virt(0) + PAGE_SIZE) & ((1 << PMD_SHIFT) - 1));
 #endif
 
+	/* Parse cmdline params for ioremap_force_shared */
+	if (cmdline_find_option_bool(boot_command_line,
+				     "ioremap_force_shared"))
+		ioremap_force_shared = 1;
+
 	early_ioremap_setup();
 
 	pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));