Message ID | 20240222-gunyah-v17-0-1e9da6763d38@quicinc.com |
---|---|
Headers | show |
Series | Drivers for Gunyah hypervisor | expand |
On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote: > Firmware and hypervisor drivers can donate system heap memory to their > respective firmware/hypervisor entities. Those drivers should unmap the > pages from the kernel's logical map before doing so. > > Export can_set_direct_map, set_direct_map_invalid_noflush, and > set_direct_map_default_noflush. Err, not they should not. And not using such super low-level interfaces from modular code.
On 23.02.2024 00:16, Elliot Berman wrote: > Add driver to detect when running under Gunyah. It performs basic > identification hypercall and populates the platform bus for resource > manager to probe. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> > --- [...] > + > + /* Might move this out to individual drivers if there's ever an API version bump */ > + if (gunyah_api_version(&gunyah_api) != GUNYAH_API_V1) { > + pr_info("Unsupported Gunyah version: %u\n", > + gunyah_api_version(&gunyah_api)); Weird for this not to be an error, but it's probably not worth resending over if it's the only thing Konrad
On Fri, Feb 23, 2024 at 10:10:47PM +0100, Konrad Dybcio wrote: > On 23.02.2024 00:16, Elliot Berman wrote: > > Add driver to detect when running under Gunyah. It performs basic > > identification hypercall and populates the platform bus for resource > > manager to probe. > > > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> > > --- > > [...] > > > + > > + /* Might move this out to individual drivers if there's ever an API version bump */ > > + if (gunyah_api_version(&gunyah_api) != GUNYAH_API_V1) { > > + pr_info("Unsupported Gunyah version: %u\n", > > + gunyah_api_version(&gunyah_api)); > > Weird for this not to be an error, but it's probably not worth resending > over if it's the only thing It is an error, but maybe I misunderstood: > > + /* Might move this out to individual drivers if there's ever an API version bump */ > > + if (gunyah_api_version(&gunyah_api) != GUNYAH_API_V1) { > > + pr_info("Unsupported Gunyah version: %u\n", > > + gunyah_api_version(&gunyah_api)); > > + return -ENODEV; > > + }
On 23.02.2024 23:58, Elliot Berman wrote: > On Fri, Feb 23, 2024 at 10:10:47PM +0100, Konrad Dybcio wrote: >> On 23.02.2024 00:16, Elliot Berman wrote: >>> Add driver to detect when running under Gunyah. It performs basic >>> identification hypercall and populates the platform bus for resource >>> manager to probe. >>> >>> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> >>> --- >> >> [...] >> >>> + >>> + /* Might move this out to individual drivers if there's ever an API version bump */ >>> + if (gunyah_api_version(&gunyah_api) != GUNYAH_API_V1) { >>> + pr_info("Unsupported Gunyah version: %u\n", >>> + gunyah_api_version(&gunyah_api)); >> >> Weird for this not to be an error, but it's probably not worth resending >> over if it's the only thing > > It is an error, but maybe I misunderstood: Sorry, I meant "pr_info might have been pr_err" Konrad > >>> + /* Might move this out to individual drivers if there's ever an API version bump */ >>> + if (gunyah_api_version(&gunyah_api) != GUNYAH_API_V1) { >>> + pr_info("Unsupported Gunyah version: %u\n", >>> + gunyah_api_version(&gunyah_api)); >>> + return -ENODEV; >>> + } >
On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote: > On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote: > > Firmware and hypervisor drivers can donate system heap memory to their > > respective firmware/hypervisor entities. Those drivers should unmap the > > pages from the kernel's logical map before doing so. > > > > Export can_set_direct_map, set_direct_map_invalid_noflush, and > > set_direct_map_default_noflush. > > Err, not they should not. And not using such super low-level interfaces > from modular code. Hi Cristoph, We've observed a few times that Linux can unintentionally access a page we've unmapped from host's stage 2 page table via an unaligned load from an adjacent page. The stage 2 is managed by Gunyah. There are few scenarios where even though we allocate and own a page from buddy, someone else could try to access the page without going through the hypervisor driver. One such instance we know about is load_unaligned_zeropad() via pathlookup_at() [1]. load_unaligned_zeropad() could be called near the end of a page. If the next page isn't mapped by the kernel in the stage one page tables, then the access from to the unmapped page from load_unaligned_zeropad() will land in __do_kernel_fault(), call fixup_exception(), and fill the remainder of the load with zeroes. If the page in question is mapped in stage 1 but was unmapped from stage 2, then the access lands back in Linux in do_sea(), leading to a panic(). Our preference would be to add fixup_exception() to S2 PTW errors for two reasons: 1. It's cheaper to do performance wise: we've already manipulated S2 page table and prevent intentional access to the page because pKVM/Gunyah drivers know that access to the page has been lost. 2. Page-granular S1 mappings only happen on arm64 with rodata=full. In an off-list discussion with the Android pkvm folks, their preference was to have the pages unmapped from stage 1. I've gone with that approach to get started but welcome discussion on the best approach. The Android (downstream) implementation of arm64 pkvm is currently implementing a hack where s2 ptw faults are given back to the host as s1 ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) -- allowing the kernel to fixup the exception. arm64 pKVM will also face this issue when implementing guest_memfd or when donating more memory to the hyp for s2 page tables, etc. As far as I can tell, this isn't an issue for arm64 pKVM today because memory isn't being dynamically donated to the hypervisor. Thanks, Elliot [1]: path_lookupat+0x340/0x3228 filename_lookup+0xbc/0x1c0 __arm64_sys_newfstatat+0xb0/0x4a0 invoke_syscall+0x58/0x118
The point is that we can't we just allow modules to unmap data from the kernel mapping, no matter how noble your intentions are.
On 26.02.24 12:06, Christoph Hellwig wrote: > The point is that we can't we just allow modules to unmap data from > the kernel mapping, no matter how noble your intentions are. I absolutely agree.
On Mon, Feb 26, 2024 at 12:53:48PM +0100, David Hildenbrand wrote: > On 26.02.24 12:06, Christoph Hellwig wrote: > > The point is that we can't we just allow modules to unmap data from > > the kernel mapping, no matter how noble your intentions are. > > I absolutely agree. > Hi David and Chirstoph, Are your preferences that we should make Gunyah builtin only or should add fixing up S2 PTW errors (or something else)? Also, do you extend that preference to modifying S2 mappings? This would require any hypervisor driver that supports confidential compute usecases to only ever be builtin. Is your concern about unmapping data from kernel mapping, then module being unloaded, and then having no way to recover the mapping? Would a permanent module be better? The primary reason we were wanting to have it as module was to avoid having driver in memory if you're not a Gunyah guest. Thanks, Elliot
On 26.02.24 18:27, Elliot Berman wrote: > On Mon, Feb 26, 2024 at 12:53:48PM +0100, David Hildenbrand wrote: >> On 26.02.24 12:06, Christoph Hellwig wrote: >>> The point is that we can't we just allow modules to unmap data from >>> the kernel mapping, no matter how noble your intentions are. >> >> I absolutely agree. >> > > Hi David and Chirstoph, > > Are your preferences that we should make Gunyah builtin only or should add > fixing up S2 PTW errors (or something else)? Having that built into the kernel certainly does sound better than exposing that functionality to arbitrary OOT modules. But still, this feels like it is using a "too-low-level" interface. > > Also, do you extend that preference to modifying S2 mappings? This would > require any hypervisor driver that supports confidential compute > usecases to only ever be builtin. > > Is your concern about unmapping data from kernel mapping, then module > being unloaded, and then having no way to recover the mapping? Would a > permanent module be better? The primary reason we were wanting to have > it as module was to avoid having driver in memory if you're not a Gunyah > guest. What I didn't grasp from this patch description: is the area where a driver would unmap/remap that memory somehow known ahead of time and limited? How would the driver obtain that memory it would try to unmap/remap the direct map of? Simply allocate some pages and then unmap the direct map? For example, we do have mm/secretmem.c, where we unmap the directmap on allocation and remap when freeing a page. A nice abstraction on alloc/free, so one cannot really do a lot of harm. Further, we enlightened the remainder of the system about secretmem, such that we can detect that the directmap is no longer there. As one example, see the secretmem_active() check in kernel/power/hibernate.c. A similar abstraction would make sense (I remember a discussion about having secretmem functionality in guest_memfd, would that help?), but the question is "which" memory you want to unmap the direct map of, and how the driver became "owner" of that memory such that it would really be allowed to mess with the directmap.
On Tue, Feb 27, 2024 at 10:49:32AM +0100, David Hildenbrand wrote: > On 26.02.24 18:27, Elliot Berman wrote: > > On Mon, Feb 26, 2024 at 12:53:48PM +0100, David Hildenbrand wrote: > > > On 26.02.24 12:06, Christoph Hellwig wrote: > > > > The point is that we can't we just allow modules to unmap data from > > > > the kernel mapping, no matter how noble your intentions are. > > > > > > I absolutely agree. > > > > > > > Hi David and Chirstoph, > > > > Are your preferences that we should make Gunyah builtin only or should add > > fixing up S2 PTW errors (or something else)? > > Having that built into the kernel certainly does sound better than exposing > that functionality to arbitrary OOT modules. But still, this feels like it > is using a "too-low-level" interface. > What are your thoughts about fixing up the stage-2 fault instead? I think this gives mmu-based isolation a slight speed boost because we avoid modifying kernel mapping. The hypervisor driver (KVM or Gunyah) knows that the page isn't mapped. Whether we get S2 or S1 fault, the kernel is likely going to crash, except in the rare case where we want to fix the exception. In that case, we can modify the S2 fault handler to call fixup_exception() when appropriate. > > > > Also, do you extend that preference to modifying S2 mappings? This would > > require any hypervisor driver that supports confidential compute > > usecases to only ever be builtin. > > > > Is your concern about unmapping data from kernel mapping, then module > > being unloaded, and then having no way to recover the mapping? Would a > > permanent module be better? The primary reason we were wanting to have > > it as module was to avoid having driver in memory if you're not a Gunyah > > guest. > > What I didn't grasp from this patch description: is the area where a driver > would unmap/remap that memory somehow known ahead of time and limited? > > How would the driver obtain that memory it would try to unmap/remap the > direct map of? Simply allocate some pages and then unmap the direct map? That's correct. > > For example, we do have mm/secretmem.c, where we unmap the directmap on > allocation and remap when freeing a page. A nice abstraction on alloc/free, > so one cannot really do a lot of harm. > > Further, we enlightened the remainder of the system about secretmem, such > that we can detect that the directmap is no longer there. As one example, > see the secretmem_active() check in kernel/power/hibernate.c. > I'll take a look at this. guest_memfd might be able to use PM notifiers here instead, but I'll dig in the archives to see why secretmem isn't using that. > A similar abstraction would make sense (I remember a discussion about having > secretmem functionality in guest_memfd, would that help?), but the question > is "which" memory you want to unmap the direct map of, and how the driver > became "owner" of that memory such that it would really be allowed to mess > with the directmap.
On Friday 23 Feb 2024 at 16:37:23 (-0800), Elliot Berman wrote: > On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote: > > On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote: > > > Firmware and hypervisor drivers can donate system heap memory to their > > > respective firmware/hypervisor entities. Those drivers should unmap the > > > pages from the kernel's logical map before doing so. > > > > > > Export can_set_direct_map, set_direct_map_invalid_noflush, and > > > set_direct_map_default_noflush. > > > > Err, not they should not. And not using such super low-level interfaces > > from modular code. > > Hi Cristoph, > > We've observed a few times that Linux can unintentionally access a page > we've unmapped from host's stage 2 page table via an unaligned load from > an adjacent page. The stage 2 is managed by Gunyah. There are few > scenarios where even though we allocate and own a page from buddy, > someone else could try to access the page without going through the > hypervisor driver. One such instance we know about is > load_unaligned_zeropad() via pathlookup_at() [1]. > > load_unaligned_zeropad() could be called near the end of a page. If the > next page isn't mapped by the kernel in the stage one page tables, then > the access from to the unmapped page from load_unaligned_zeropad() will > land in __do_kernel_fault(), call fixup_exception(), and fill the > remainder of the load with zeroes. If the page in question is mapped in > stage 1 but was unmapped from stage 2, then the access lands back in > Linux in do_sea(), leading to a panic(). > > Our preference would be to add fixup_exception() to S2 PTW errors for > two reasons: > 1. It's cheaper to do performance wise: we've already manipulated S2 > page table and prevent intentional access to the page because > pKVM/Gunyah drivers know that access to the page has been lost. > 2. Page-granular S1 mappings only happen on arm64 with rodata=full. > > In an off-list discussion with the Android pkvm folks, their preference > was to have the pages unmapped from stage 1. I've gone with that > approach to get started but welcome discussion on the best approach. > > The Android (downstream) implementation of arm64 pkvm is currently > implementing a hack where s2 ptw faults are given back to the host as s1 > ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) -- > allowing the kernel to fixup the exception. > > arm64 pKVM will also face this issue when implementing guest_memfd or > when donating more memory to the hyp for s2 page tables, etc. As far as > I can tell, this isn't an issue for arm64 pKVM today because memory > isn't being dynamically donated to the hypervisor. FWIW pKVM already donates memory dynamically to the hypervisor, to store e.g. guest VM metadata and page-tables, and we've never seen that problem as far as I can recall. A key difference is that pKVM injects a data abort back into the kernel in case of a stage-2 fault, so the whole EXTABLE trick/hack in load_unaligned_zeropad() should work fine out of the box. As discussed offline, Gunyah injecting an SEA into the kernel is questionable, but I understand that the architecture is a bit lacking in this department, and that's probably the next best thing. Could the Gunyah driver allocate from a CMA region instead? That would surely simplify unmapping from EL1 stage-1 (similar to how drivers usually donate memory to TZ). Thanks, Quentin
On Mon, Mar 04, 2024 at 01:10:48PM +0000, Quentin Perret wrote: > On Friday 23 Feb 2024 at 16:37:23 (-0800), Elliot Berman wrote: > > On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote: > > > On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote: > > > > Firmware and hypervisor drivers can donate system heap memory to their > > > > respective firmware/hypervisor entities. Those drivers should unmap the > > > > pages from the kernel's logical map before doing so. > > > > > > > > Export can_set_direct_map, set_direct_map_invalid_noflush, and > > > > set_direct_map_default_noflush. > > > > > > Err, not they should not. And not using such super low-level interfaces > > > from modular code. > > > > Hi Cristoph, > > > > We've observed a few times that Linux can unintentionally access a page > > we've unmapped from host's stage 2 page table via an unaligned load from > > an adjacent page. The stage 2 is managed by Gunyah. There are few > > scenarios where even though we allocate and own a page from buddy, > > someone else could try to access the page without going through the > > hypervisor driver. One such instance we know about is > > load_unaligned_zeropad() via pathlookup_at() [1]. > > > > load_unaligned_zeropad() could be called near the end of a page. If the > > next page isn't mapped by the kernel in the stage one page tables, then > > the access from to the unmapped page from load_unaligned_zeropad() will > > land in __do_kernel_fault(), call fixup_exception(), and fill the > > remainder of the load with zeroes. If the page in question is mapped in > > stage 1 but was unmapped from stage 2, then the access lands back in > > Linux in do_sea(), leading to a panic(). > > > > Our preference would be to add fixup_exception() to S2 PTW errors for > > two reasons: > > 1. It's cheaper to do performance wise: we've already manipulated S2 > > page table and prevent intentional access to the page because > > pKVM/Gunyah drivers know that access to the page has been lost. > > 2. Page-granular S1 mappings only happen on arm64 with rodata=full. > > > > In an off-list discussion with the Android pkvm folks, their preference > > was to have the pages unmapped from stage 1. I've gone with that > > approach to get started but welcome discussion on the best approach. > > > > The Android (downstream) implementation of arm64 pkvm is currently > > implementing a hack where s2 ptw faults are given back to the host as s1 > > ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) -- > > allowing the kernel to fixup the exception. > > > > arm64 pKVM will also face this issue when implementing guest_memfd or > > when donating more memory to the hyp for s2 page tables, etc. As far as > > I can tell, this isn't an issue for arm64 pKVM today because memory > > isn't being dynamically donated to the hypervisor. > > FWIW pKVM already donates memory dynamically to the hypervisor, to store > e.g. guest VM metadata and page-tables, and we've never seen that > problem as far as I can recall. > > A key difference is that pKVM injects a data abort back into the kernel > in case of a stage-2 fault, so the whole EXTABLE trick/hack in > load_unaligned_zeropad() should work fine out of the box. > > As discussed offline, Gunyah injecting an SEA into the kernel is > questionable, but I understand that the architecture is a bit lacking in > this department, and that's probably the next best thing. > > Could the Gunyah driver allocate from a CMA region instead? That would > surely simplify unmapping from EL1 stage-1 (similar to how drivers > usually donate memory to TZ). In my opinion, CMA is overly restrictive because we'd have to define the region up front and we don't know how much memory the virtual machines the user will want to launch. Thanks, Elliot
On Thu, Feb 22, 2024 at 03:16:24PM -0800, Elliot Berman wrote: > Gunyah is an open-source Type-1 hypervisor developed by Qualcomm. It > does not depend on any lower-privileged OS/kernel code for its core > functionality. This increases its security and can support a smaller > trusted computing based when compared to Type-2 hypervisors. %s/based/base > > Add documentation describing the Gunyah hypervisor and the main > components of the Gunyah hypervisor which are of interest to Linux > virtualization development. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> > +- Virtual platform: > + > + Architectural devices such as interrupt controllers and CPU timers are > + directly provided by the hypervisor as well as core virtual platform devices > + and system APIs such as ARM PSCI. > + > +- Device Virtualization: > + > + Para-virtualization of devices is supported using inter-VM communication and > + virtio transport support. Select stage 2 faults by virtual machines that use %s/Select/Selected > + proxy-scheduled vCPUs can be handled directly by Linux to provide Type-2 > + hypervisor style on-demand paging and/or device emulation. > + The doc patch looks good to me. Thanks, Pavan
On Monday 04 Mar 2024 at 15:37:41 (-0800), Elliot Berman wrote: > On Mon, Mar 04, 2024 at 01:10:48PM +0000, Quentin Perret wrote: > > On Friday 23 Feb 2024 at 16:37:23 (-0800), Elliot Berman wrote: > > > On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote: > > > > On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote: > > > > > Firmware and hypervisor drivers can donate system heap memory to their > > > > > respective firmware/hypervisor entities. Those drivers should unmap the > > > > > pages from the kernel's logical map before doing so. > > > > > > > > > > Export can_set_direct_map, set_direct_map_invalid_noflush, and > > > > > set_direct_map_default_noflush. > > > > > > > > Err, not they should not. And not using such super low-level interfaces > > > > from modular code. > > > > > > Hi Cristoph, > > > > > > We've observed a few times that Linux can unintentionally access a page > > > we've unmapped from host's stage 2 page table via an unaligned load from > > > an adjacent page. The stage 2 is managed by Gunyah. There are few > > > scenarios where even though we allocate and own a page from buddy, > > > someone else could try to access the page without going through the > > > hypervisor driver. One such instance we know about is > > > load_unaligned_zeropad() via pathlookup_at() [1]. > > > > > > load_unaligned_zeropad() could be called near the end of a page. If the > > > next page isn't mapped by the kernel in the stage one page tables, then > > > the access from to the unmapped page from load_unaligned_zeropad() will > > > land in __do_kernel_fault(), call fixup_exception(), and fill the > > > remainder of the load with zeroes. If the page in question is mapped in > > > stage 1 but was unmapped from stage 2, then the access lands back in > > > Linux in do_sea(), leading to a panic(). > > > > > > Our preference would be to add fixup_exception() to S2 PTW errors for > > > two reasons: > > > 1. It's cheaper to do performance wise: we've already manipulated S2 > > > page table and prevent intentional access to the page because > > > pKVM/Gunyah drivers know that access to the page has been lost. > > > 2. Page-granular S1 mappings only happen on arm64 with rodata=full. > > > > > > In an off-list discussion with the Android pkvm folks, their preference > > > was to have the pages unmapped from stage 1. I've gone with that > > > approach to get started but welcome discussion on the best approach. > > > > > > The Android (downstream) implementation of arm64 pkvm is currently > > > implementing a hack where s2 ptw faults are given back to the host as s1 > > > ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) -- > > > allowing the kernel to fixup the exception. > > > > > > arm64 pKVM will also face this issue when implementing guest_memfd or > > > when donating more memory to the hyp for s2 page tables, etc. As far as > > > I can tell, this isn't an issue for arm64 pKVM today because memory > > > isn't being dynamically donated to the hypervisor. > > > > FWIW pKVM already donates memory dynamically to the hypervisor, to store > > e.g. guest VM metadata and page-tables, and we've never seen that > > problem as far as I can recall. > > > > A key difference is that pKVM injects a data abort back into the kernel > > in case of a stage-2 fault, so the whole EXTABLE trick/hack in > > load_unaligned_zeropad() should work fine out of the box. > > > > As discussed offline, Gunyah injecting an SEA into the kernel is > > questionable, but I understand that the architecture is a bit lacking in > > this department, and that's probably the next best thing. > > > > Could the Gunyah driver allocate from a CMA region instead? That would > > surely simplify unmapping from EL1 stage-1 (similar to how drivers > > usually donate memory to TZ). > > In my opinion, CMA is overly restrictive because we'd have to define the > region up front and we don't know how much memory the virtual machines > the user will want to launch. I was thinking of using CMA to allocate pages needed to store guest metadata and such at EL2, but not to back the actual guest pages themselves. That still means overallocating somehow, but that should hopefully be much smaller and be less of a problem? For the actual guest pages, the gunyah variant of guestmem will have to unmap the pages from the direct map itself, but I'd be personally happy with making that part non-modular to avoid the issue Christoph and others have raised. Thanks, Quentin
On Tue, Mar 05, 2024 at 03:30:58PM +0000, Quentin Perret wrote: > On Monday 04 Mar 2024 at 15:37:41 (-0800), Elliot Berman wrote: > > On Mon, Mar 04, 2024 at 01:10:48PM +0000, Quentin Perret wrote: > > > On Friday 23 Feb 2024 at 16:37:23 (-0800), Elliot Berman wrote: > > > > On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote: > > > > > On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote: > > > > > > Firmware and hypervisor drivers can donate system heap memory to their > > > > > > respective firmware/hypervisor entities. Those drivers should unmap the > > > > > > pages from the kernel's logical map before doing so. > > > > > > > > > > > > Export can_set_direct_map, set_direct_map_invalid_noflush, and > > > > > > set_direct_map_default_noflush. > > > > > > > > > > Err, not they should not. And not using such super low-level interfaces > > > > > from modular code. > > > > > > > > Hi Cristoph, > > > > > > > > We've observed a few times that Linux can unintentionally access a page > > > > we've unmapped from host's stage 2 page table via an unaligned load from > > > > an adjacent page. The stage 2 is managed by Gunyah. There are few > > > > scenarios where even though we allocate and own a page from buddy, > > > > someone else could try to access the page without going through the > > > > hypervisor driver. One such instance we know about is > > > > load_unaligned_zeropad() via pathlookup_at() [1]. > > > > > > > > load_unaligned_zeropad() could be called near the end of a page. If the > > > > next page isn't mapped by the kernel in the stage one page tables, then > > > > the access from to the unmapped page from load_unaligned_zeropad() will > > > > land in __do_kernel_fault(), call fixup_exception(), and fill the > > > > remainder of the load with zeroes. If the page in question is mapped in > > > > stage 1 but was unmapped from stage 2, then the access lands back in > > > > Linux in do_sea(), leading to a panic(). > > > > > > > > Our preference would be to add fixup_exception() to S2 PTW errors for > > > > two reasons: > > > > 1. It's cheaper to do performance wise: we've already manipulated S2 > > > > page table and prevent intentional access to the page because > > > > pKVM/Gunyah drivers know that access to the page has been lost. > > > > 2. Page-granular S1 mappings only happen on arm64 with rodata=full. > > > > > > > > In an off-list discussion with the Android pkvm folks, their preference > > > > was to have the pages unmapped from stage 1. I've gone with that > > > > approach to get started but welcome discussion on the best approach. > > > > > > > > The Android (downstream) implementation of arm64 pkvm is currently > > > > implementing a hack where s2 ptw faults are given back to the host as s1 > > > > ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) -- > > > > allowing the kernel to fixup the exception. > > > > > > > > arm64 pKVM will also face this issue when implementing guest_memfd or > > > > when donating more memory to the hyp for s2 page tables, etc. As far as > > > > I can tell, this isn't an issue for arm64 pKVM today because memory > > > > isn't being dynamically donated to the hypervisor. > > > > > > FWIW pKVM already donates memory dynamically to the hypervisor, to store > > > e.g. guest VM metadata and page-tables, and we've never seen that > > > problem as far as I can recall. > > > > > > A key difference is that pKVM injects a data abort back into the kernel > > > in case of a stage-2 fault, so the whole EXTABLE trick/hack in > > > load_unaligned_zeropad() should work fine out of the box. > > > > > > As discussed offline, Gunyah injecting an SEA into the kernel is > > > questionable, but I understand that the architecture is a bit lacking in > > > this department, and that's probably the next best thing. > > > > > > Could the Gunyah driver allocate from a CMA region instead? That would > > > surely simplify unmapping from EL1 stage-1 (similar to how drivers > > > usually donate memory to TZ). > > > > In my opinion, CMA is overly restrictive because we'd have to define the > > region up front and we don't know how much memory the virtual machines > > the user will want to launch. > > I was thinking of using CMA to allocate pages needed to store guest > metadata and such at EL2, but not to back the actual guest pages > themselves. That still means overallocating somehow, but that should > hopefully be much smaller and be less of a problem? Ah, I understood the context now. Yes, we might need to use CMA region when donating memory to Gunyah if we have to ensure the memory is unmapped from stage 1, since we wouldn't use guest_memfd for that. > > For the actual guest pages, the gunyah variant of guestmem will have to > unmap the pages from the direct map itself, but I'd be personally happy I still disagree that this is a Gunyah-specific problem. As far as we can tell, Arm doesn't specify how EL2 can tell EL1 its S2 page tables couldn't give a validation translation of the IPA from stage 1. IMO, downstream/Android pKVM is violating spec for ESR_EL1 by using the S1PTW bit (which is res0 for everyone except EL2 [1]) and this means that guests need to be pKVM-enlightened. If we are adding pKVM enlightment in the exception handlers, can we add Gunyah enlightment to handle the same? Thanks, Elliot [1]: https://developer.arm.com/documentation/ddi0601/2023-12/AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-?lang=en#fieldset_0-24_0_16-7_7
On Tuesday 05 Mar 2024 at 12:26:59 (-0800), Elliot Berman wrote: > I still disagree that this is a Gunyah-specific problem. As far as we > can tell, Arm doesn't specify how EL2 can tell EL1 its S2 page tables > couldn't give a validation translation of the IPA from stage 1. IMO, > downstream/Android pKVM is violating spec for ESR_EL1 by using the > S1PTW bit (which is res0 for everyone except EL2 [1]) and this means > that guests need to be pKVM-enlightened. Not really, in pKVM we have a very clear distinction between host Linux and guests, and only the host needs to be enlightened. But luckily, since pKVM is part of Linux, this is pretty much an internal kernel thing, so we're very flexible and if the S1PTW trick ever conflicts with something else (e.g. NV) we can fairly easily switch to another approach. We can tolerate non-architectural tricks like that between pKVM and host Linux because that is not ABI, but we certainly can't do that for guests. > If we are adding pKVM > enlightment in the exception handlers, can we add Gunyah enlightment to > handle the same? If you mean extending the Linux SEA handler so it does what Gunyah wants, then I'm personally not supportive of that idea since the 'contract' between Linux and Gunyah _is_ the architecture. The only ways I could see Gunyah delegate stage-2 fault handling to Linux cleanly is: - either talk to Arm to introduce a new ESR specifically for this, which doesn't sound entirely crazy to me; - or have Gunyah and Linux negotiate in software the location of the handlers. That probably means SDEI or equivalent which is a can of worm in itself I presume, and I'm not sure how feasible it would be for this handler to live in the Gunyah driver (that too probably requires exporting kernel symbols we don't want to export). Thanks, Quentin
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:24]: > Gunyah is an open-source Type-1 hypervisor developed by Qualcomm. It > does not depend on any lower-privileged OS/kernel code for its core > functionality. This increases its security and can support a smaller > trusted computing based when compared to Type-2 hypervisors. > > Add documentation describing the Gunyah hypervisor and the main > components of the Gunyah hypervisor which are of interest to Linux > virtualization development. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> LGTM. Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:28]: > Add driver to detect when running under Gunyah. It performs basic > identification hypercall and populates the platform bus for resource > manager to probe. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> LGTM Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:30]: > The resource manager is a special virtual machine which is always > running on a Gunyah system. It provides APIs for creating and destroying > VMs, secure memory management, sharing/lending of memory between VMs, > and setup of inter-VM communication. Calls to the resource manager are > made via message queues. > > This patch implements the basic probing and RPC mechanism to make those > API calls. Request/response calls can be made with gh_rm_call. > Drivers can also register to notifications pushed by RM via > gh_rm_register_notifier > > Specific API calls that resource manager supports will be implemented in > subsequent patches. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Left a minor comment below. LGTM otherwise. Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> > +static irqreturn_t gunyah_rm_rx(int irq, void *data) > +{ > + enum gunyah_error gunyah_error; > + struct gunyah_rm_rpc_hdr *hdr; > + struct gunyah_rm *rm = data; > + void *msg = &rm->recv_msg[0]; > + size_t len; > + bool ready; > + > + do { > + gunyah_error = gunyah_hypercall_msgq_recv(rm->rx_ghrsc.capid, > + msg, > + sizeof(rm->recv_msg), > + &len, &ready); > + if (gunyah_error != GUNYAH_ERROR_OK) { > + if (gunyah_error != GUNYAH_ERROR_MSGQUEUE_EMPTY) > + dev_warn(rm->dev, > + "Failed to receive data: %d\n", > + gunyah_error); > + return IRQ_HANDLED; > + } > + > + if (len < sizeof(*hdr)) { > + dev_err_ratelimited( > + rm->dev, > + "Too small message received. size=%ld\n", len); > + continue; In practice we should never hit this condition, in case we do encounter, do you see a reason why continue is preferred over simply breaking the loop? > + } > + > + hdr = msg; > + if (hdr->api != RM_RPC_API) { > + dev_err(rm->dev, "Unknown RM RPC API version: %x\n", > + hdr->api); > + return IRQ_HANDLED; > + } > + > + switch (FIELD_GET(RM_RPC_TYPE_MASK, hdr->type)) { > + case RM_RPC_TYPE_NOTIF: > + gunyah_rm_process_notif(rm, msg, len); > + break; > + case RM_RPC_TYPE_REPLY: > + gunyah_rm_process_reply(rm, msg, len); > + break; > + case RM_RPC_TYPE_CONTINUATION: > + gunyah_rm_process_cont(rm, rm->active_rx_message, msg, > + len); > + break; > + default: > + dev_err(rm->dev, > + "Invalid message type (%lu) received\n", > + FIELD_GET(RM_RPC_TYPE_MASK, hdr->type)); > + return IRQ_HANDLED; > + } > + } while (ready); > + > + return IRQ_HANDLED; > +}
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:31]: > Gunyah VM manager is a kernel moduel which exposes an interface to > userspace to load, run, and interact with other Gunyah virtual machines. > The interface is a character device at /dev/gunyah. > > Add a basic VM manager driver. Upcoming patches will add more ioctls > into this driver. > > Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > Reviewed-by: Alex Elder <elder@linaro.org> > Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com> > Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com> > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> LGTM Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:32]: > Add Gunyah Resource Manager RPC interfaces to launch an unauthenticated > virtual machine. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> LGTM Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
On Thu, Mar 07, 2024 at 09:08:43PM +0530, Srivatsa Vaddagiri wrote: > * Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:30]: > > > The resource manager is a special virtual machine which is always > > running on a Gunyah system. It provides APIs for creating and destroying > > VMs, secure memory management, sharing/lending of memory between VMs, > > and setup of inter-VM communication. Calls to the resource manager are > > made via message queues. > > > > This patch implements the basic probing and RPC mechanism to make those > > API calls. Request/response calls can be made with gh_rm_call. > > Drivers can also register to notifications pushed by RM via > > gh_rm_register_notifier > > > > Specific API calls that resource manager supports will be implemented in > > subsequent patches. > > > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> > > Left a minor comment below. LGTM otherwise. > > Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> > > > +static irqreturn_t gunyah_rm_rx(int irq, void *data) > > +{ > > + enum gunyah_error gunyah_error; > > + struct gunyah_rm_rpc_hdr *hdr; > > + struct gunyah_rm *rm = data; > > + void *msg = &rm->recv_msg[0]; > > + size_t len; > > + bool ready; > > + > > + do { > > + gunyah_error = gunyah_hypercall_msgq_recv(rm->rx_ghrsc.capid, > > + msg, > > + sizeof(rm->recv_msg), > > + &len, &ready); > > + if (gunyah_error != GUNYAH_ERROR_OK) { > > + if (gunyah_error != GUNYAH_ERROR_MSGQUEUE_EMPTY) > > + dev_warn(rm->dev, > > + "Failed to receive data: %d\n", > > + gunyah_error); > > + return IRQ_HANDLED; > > + } > > + > > + if (len < sizeof(*hdr)) { > > + dev_err_ratelimited( > > + rm->dev, > > + "Too small message received. size=%ld\n", len); > > + continue; > > In practice we should never hit this condition, in case we do encounter, do you > see a reason why continue is preferred over simply breaking the loop? > There might be more messages to read, which we would not otherwise read. Since those messages might be parseable, I'd rather try to recover than break communication with RM. As you mention, we should never encounter this condition. The guard is to avoid reading garbage values. > > + } > > + > > + hdr = msg; > > + if (hdr->api != RM_RPC_API) { > > + dev_err(rm->dev, "Unknown RM RPC API version: %x\n", > > + hdr->api); > > + return IRQ_HANDLED; > > + } > > + > > + switch (FIELD_GET(RM_RPC_TYPE_MASK, hdr->type)) { > > + case RM_RPC_TYPE_NOTIF: > > + gunyah_rm_process_notif(rm, msg, len); > > + break; > > + case RM_RPC_TYPE_REPLY: > > + gunyah_rm_process_reply(rm, msg, len); > > + break; > > + case RM_RPC_TYPE_CONTINUATION: > > + gunyah_rm_process_cont(rm, rm->active_rx_message, msg, > > + len); > > + break; > > + default: > > + dev_err(rm->dev, > > + "Invalid message type (%lu) received\n", > > + FIELD_GET(RM_RPC_TYPE_MASK, hdr->type)); > > + return IRQ_HANDLED; > > + } > > + } while (ready); > > + > > + return IRQ_HANDLED; > > +}
On Wed, Mar 06, 2024 at 12:05:38PM +0000, Quentin Perret wrote: > On Tuesday 05 Mar 2024 at 12:26:59 (-0800), Elliot Berman wrote: > > I still disagree that this is a Gunyah-specific problem. As far as we > > can tell, Arm doesn't specify how EL2 can tell EL1 its S2 page tables > > couldn't give a validation translation of the IPA from stage 1. IMO, > > downstream/Android pKVM is violating spec for ESR_EL1 by using the > > S1PTW bit (which is res0 for everyone except EL2 [1]) and this means > > that guests need to be pKVM-enlightened. > > Not really, in pKVM we have a very clear distinction between host Linux > and guests, and only the host needs to be enlightened. But luckily, > since pKVM is part of Linux, this is pretty much an internal kernel > thing, so we're very flexible and if the S1PTW trick ever conflicts > with something else (e.g. NV) we can fairly easily switch to another > approach. We can tolerate non-architectural tricks like that between > pKVM and host Linux because that is not ABI, but we certainly can't do > that for guests. > > > If we are adding pKVM > > enlightment in the exception handlers, can we add Gunyah enlightment to > > handle the same? > > If you mean extending the Linux SEA handler so it does what Gunyah > wants, then I'm personally not supportive of that idea since the > 'contract' between Linux and Gunyah _is_ the architecture. Fair enough. We're building out more use cases where we want to allocate memory from buddy and donate it to some entity which unmaps it from Linux (some entity = Gunyah or Qualcomm firmware). Video DRM is an example we're working on. I imagine OP-TEE might eventually have use-cases as well since pKVM is doing same. David expressed concerns about exporting the direct unmap functions. What kind of framework/restrictions do we want to have instead? I don't think making drivers like Gunyah a builtin-only module [1] (even a refactored/small portion) is the best approach, but maybe that is what we want to do. Thanks, Elliot [1]: qcom_scm_assign_mem (d/firmware/qcom/qcom_scm.ko) is an example of a module that would have to become builtin as we upstream use cases that lend buddy-allocated memory to firmware
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:35]: > Gunyah doesn't process the label and that makes it > possible for userspace to create multiple resources with the same label. I think that description conflicts with what is implemented in this patch? int gunyah_vm_add_resource_ticket(struct gunyah_vm *ghvm, struct gunyah_vm_resource_ticket *ticket) { mutex_lock(&ghvm->resources_lock); list_for_each_entry(iter, &ghvm->resource_tickets, vm_list) { if (iter->resource_type == ticket->resource_type && iter->label == ticket->label) { ret = -EEXIST; goto out; } } // > @@ -134,6 +246,25 @@ static int gunyah_vm_start(struct gunyah_vm *ghvm) > } > ghvm->vm_status = GUNYAH_RM_VM_STATUS_READY; > > + ret = gunyah_rm_get_hyp_resources(ghvm->rm, ghvm->vmid, &resources); > + if (ret) { > + dev_warn(ghvm->parent, > + "Failed to get hypervisor resources for VM: %d\n", > + ret); > + goto err; > + } Where do we free memory pointed by 'resources' ptr? > + > + for (i = 0, n = le32_to_cpu(resources->n_entries); i < n; i++) { > + ghrsc = gunyah_rm_alloc_resource(ghvm->rm, > + &resources->entries[i]); > + if (!ghrsc) { > + ret = -ENOMEM; > + goto err; > + } > + > + gunyah_vm_add_resource(ghvm, ghrsc); > + }
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:33]: > Add ioctl to trigger the start of a Gunyah virtual machine. Subsequent > commits will provide memory to the virtual machine and add ability to > interact with the resources (capabilities) of the virtual machine. > Although start of the virtual machine can be done implicitly on the > first vCPU run for proxy-schedule virtual machines, there is a > non-trivial number of calls to Gunyah: a more precise error can be given > to userspace which calls VM_START without looking at kernel logs because > userspace can detect that the VM start failed instead of "couldn't run > the vCPU". > > Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com> > Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com> > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Minor nit below. LGTM otherwise. Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> > +static int gunyah_vm_start(struct gunyah_vm *ghvm) > +{ .. > +err: > + /* gunyah_vm_free will handle releasing resources and reclaiming memory */ s/gunyah_vm_free/gunyah_vm_release > + up_write(&ghvm->status_lock); > + return ret; > +}
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:34]: > When booting a Gunyah virtual machine, the host VM may gain capabilities > to interact with resources for the guest virtual machine. Examples of > such resources are vCPUs or message queues. To use those resources, we > need to translate the RM response into a gunyah_resource structure which > are useful to Linux drivers. Presently, Linux drivers need only to know > the type of resource, the capability ID, and an interrupt. > > On ARM64 systems, the interrupt reported by Gunyah is the GIC interrupt > ID number and always a SPI or extended SPI. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Minor nit below. LGTM otherwise Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> > +struct gunyah_resource * > +gunyah_rm_alloc_resource(struct gunyah_rm *rm, > + struct gunyah_rm_hyp_resource *hyp_resource) > +{ > + struct gunyah_resource *ghrsc; > + int ret; > + > + ghrsc = kzalloc(sizeof(*ghrsc), GFP_KERNEL); > + if (!ghrsc) > + return NULL; > + > + ghrsc->type = hyp_resource->type; > + ghrsc->capid = le64_to_cpu(hyp_resource->cap_id); > + ghrsc->irq = IRQ_NOTCONNECTED; > + ghrsc->rm_label = le32_to_cpu(hyp_resource->resource_label); > + if (hyp_resource->virq) { > + struct irq_fwspec fwspec; > + > + > + fwspec.fwnode = rm->parent_fwnode; > + ret = arch_gunyah_fill_irq_fwspec_params(le32_to_cpu(hyp_resource->virq), &fwspec); > + if (ret) { > + dev_err(rm->dev, > + "Failed to translate interrupt for resource %d label: %d: %d\n", > + ghrsc->type, ghrsc->rm_label, ret); Not bailing on error here appears wrong. Can you check? > + } > + > + ret = irq_create_fwspec_mapping(&fwspec); > + if (ret < 0) { > + dev_err(rm->dev, > + "Failed to allocate interrupt for resource %d label: %d: %d\n", > + ghrsc->type, ghrsc->rm_label, ret); > + kfree(ghrsc); > + return NULL; > + } > + ghrsc->irq = ret; > + } > + > + return ghrsc; > +}
On Mon, Mar 11, 2024 at 11:08:06AM +0530, Srivatsa Vaddagiri wrote: > * Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:35]: > > > Gunyah doesn't process the label and that makes it > > possible for userspace to create multiple resources with the same label. > > I think that description conflicts with what is implemented in this patch? > > int gunyah_vm_add_resource_ticket(struct gunyah_vm *ghvm, > struct gunyah_vm_resource_ticket *ticket) > { > mutex_lock(&ghvm->resources_lock); > list_for_each_entry(iter, &ghvm->resource_tickets, vm_list) { > if (iter->resource_type == ticket->resource_type && > iter->label == ticket->label) { > ret = -EEXIST; > goto out; > } > } > > > // > It's a justification for the next sentence in the commit text: Resource ticket owners need to be prepared for populate to be called multiple times if userspace created multiple resources with the same label. VM manager can make sure that only one entity is going to receive the resources (the check above you highlighted), but I can't make sure that it's only going to exactly one resource. We don't currently have a scenario where we need/want multiple resources with the same label, we might have that in the future and I didn't want to add that restriction in common code. > > @@ -134,6 +246,25 @@ static int gunyah_vm_start(struct gunyah_vm *ghvm) > > } > > ghvm->vm_status = GUNYAH_RM_VM_STATUS_READY; > > > > + ret = gunyah_rm_get_hyp_resources(ghvm->rm, ghvm->vmid, &resources); > > + if (ret) { > > + dev_warn(ghvm->parent, > > + "Failed to get hypervisor resources for VM: %d\n", > > + ret); > > + goto err; > > + } > > Where do we free memory pointed by 'resources' ptr? > Ah, I meant to add the __free(kfree) annotation. I've added it. > > + > > + for (i = 0, n = le32_to_cpu(resources->n_entries); i < n; i++) { > > + ghrsc = gunyah_rm_alloc_resource(ghvm->rm, > > + &resources->entries[i]); > > + if (!ghrsc) { > > + ret = -ENOMEM; > > + goto err; > > + } > > + > > + gunyah_vm_add_resource(ghvm, ghrsc); > > + }
On Mon, Mar 11, 2024 at 11:09:05AM +0530, Srivatsa Vaddagiri wrote: > * Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:34]: > > > When booting a Gunyah virtual machine, the host VM may gain capabilities > > to interact with resources for the guest virtual machine. Examples of > > such resources are vCPUs or message queues. To use those resources, we > > need to translate the RM response into a gunyah_resource structure which > > are useful to Linux drivers. Presently, Linux drivers need only to know > > the type of resource, the capability ID, and an interrupt. > > > > On ARM64 systems, the interrupt reported by Gunyah is the GIC interrupt > > ID number and always a SPI or extended SPI. > > > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> > > Minor nit below. LGTM otherwise > > Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> > > > +struct gunyah_resource * > > +gunyah_rm_alloc_resource(struct gunyah_rm *rm, > > + struct gunyah_rm_hyp_resource *hyp_resource) > > +{ > > + struct gunyah_resource *ghrsc; > > + int ret; > > + > > + ghrsc = kzalloc(sizeof(*ghrsc), GFP_KERNEL); > > + if (!ghrsc) > > + return NULL; > > + > > + ghrsc->type = hyp_resource->type; > > + ghrsc->capid = le64_to_cpu(hyp_resource->cap_id); > > + ghrsc->irq = IRQ_NOTCONNECTED; > > + ghrsc->rm_label = le32_to_cpu(hyp_resource->resource_label); > > + if (hyp_resource->virq) { > > + struct irq_fwspec fwspec; > > + > > + > > + fwspec.fwnode = rm->parent_fwnode; > > + ret = arch_gunyah_fill_irq_fwspec_params(le32_to_cpu(hyp_resource->virq), &fwspec); > > + if (ret) { > > + dev_err(rm->dev, > > + "Failed to translate interrupt for resource %d label: %d: %d\n", > > + ghrsc->type, ghrsc->rm_label, ret); > > Not bailing on error here appears wrong. Can you check? > Ah, yes. I'll return ghrsc here. I think it's better than returning NULL because user of resource might be able to cope without the interrupt and can let us get a more helpful kernel log messages because the higher level VM function will complain. > > + } > > + > > + ret = irq_create_fwspec_mapping(&fwspec); > > + if (ret < 0) { > > + dev_err(rm->dev, > > + "Failed to allocate interrupt for resource %d label: %d: %d\n", > > + ghrsc->type, ghrsc->rm_label, ret); > > + kfree(ghrsc); > > + return NULL; > > + } > > + ghrsc->irq = ret; > > + } > > + > > + return ghrsc; > > +}
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:36]: > Introduce a framework for Gunyah userspace to install VM functions. VM > functions are optional interfaces to the virtual machine. vCPUs, > ioeventfs, and irqfds are examples of such VM functions and are > implemented in subsequent patches. > > A generic framework is implemented instead of individual ioctls to > create vCPUs, irqfds, etc., in order to simplify the VM manager core > implementation and allow dynamic loading of VM function modules. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Minor nit below. LGTM otherwise Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> > +static struct gunyah_vm_function *gunyah_vm_get_function(u32 type) > +{ > + struct gunyah_vm_function *fn; > + > + fn = xa_load(&gunyah_vm_functions, type); > + if (!fn) { > + request_module("ghfunc:%d", type); s/%d/%u > + > + fn = xa_load(&gunyah_vm_functions, type); > + } > + > + if (!fn || !try_module_get(fn->mod)) > + fn = ERR_PTR(-ENOENT); > + > + return fn; > +}
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:37]: > Add hypercall to donate CPU time to a vCPU. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> LGTM Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:38]: > Gunyah allows vCPUs that are configured as proxy-scheduled to be scheduled by > another virtual machine (host) that holds capabilities to those vCPUs with > suitable rights. > > Gunyah also supports configuring regions of a proxy-scheduled VM's address > space to be virtualized by the host VM. This permits a host VMM to emulate MMIO > devices in the proxy-scheduled VM. > > vCPUs are presented to the host as a Gunyah resource and represented to > userspace as a Gunyah VM function. > > Creating the vcpu function on the VM will create a file descriptor that: > - can handle an ioctl to run the vCPU. When called, Gunyah will directly > context-switch to the selected vCPU and run it until one of the following > events occurs: > * the host vcpu's time slice ends > * the host vcpu receives an interrupt or would have been pre-empted > by the hypervisor > * a fault occurs in the proxy-scheduled vcpu > * a power management event, such as idle or cpu-off call in the vcpu > - can be mmap'd to share the gunyah_vcpu_run structure with userspace. This > allows the vcpu_run result codes to be accessed, and for arguments to > vcpu_run to be passed, e.g. for resuming the vcpu when handling certain fault > and exit cases. > > Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com> > Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com> > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> LGTM Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:39]: > Three hypercalls are needed to support demand paging. > In create page mappings for a virtual machine's address space, memory > must be moved to a memory extent that is allowed to be mapped into that > address space. Memory extents are Gunyah's implementation of access > control. Once the memory is moved to the proper memory extent, the > memory can be mapped into the VM's address space. Implement the > bindings to perform those hypercalls. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Reviewed-by: Srivatsa Vaddagiri <quic_svaddagiri@quicinc.com>
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:40]: > In a Gunyah hypervisor system using the Gunyah Resource Manager, the > "standard" unit of donating, lending and sharing memory is called a > memory parcel (memparcel). A memparcel is an abstraction used by the > resource manager for securely managing donating, lending and sharing > memory, which may be physically and virtually fragmented, without > dealing directly with physical memory addresses. > > Memparcels are created and managed through the RM RPC functions for > lending, sharing and reclaiming memory from VMs. > > When creating a new VM the initial VM memory containing the VM image and > the VM's device tree blob must be provided as a memparcel. The memparcel > must be created using the RM RPC for lending and mapping the memory to > the VM. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:52]: > The initial context of a the primary vCPU can be initialized by > performing RM RPC calls. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com> - vatsa
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:53]: > RM provides APIs to fill boot context the initial registers upon > starting the vCPU. Most importantly, this allows userspace to set the > initial PC for the primary vCPU when the VM starts to run. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Reviewed-by: Srivatsa Vaddagiri <quic_svaddagi@quicinc.com>
On Thu, Feb 22, 2024 at 03:16:34PM -0800, Elliot Berman wrote: > When booting a Gunyah virtual machine, the host VM may gain capabilities > to interact with resources for the guest virtual machine. Examples of > such resources are vCPUs or message queues. To use those resources, we > need to translate the RM response into a gunyah_resource structure which > are useful to Linux drivers. Presently, Linux drivers need only to know > the type of resource, the capability ID, and an interrupt. > > On ARM64 systems, the interrupt reported by Gunyah is the GIC interrupt > ID number and always a SPI or extended SPI. > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> > --- > arch/arm64/include/asm/gunyah.h | 36 ++++++++++++++++++++++ > drivers/virt/gunyah/rsc_mgr.c | 67 +++++++++++++++++++++++++++++++++++++++++ > drivers/virt/gunyah/rsc_mgr.h | 5 +++ > include/linux/gunyah.h | 2 ++ > 4 files changed, 110 insertions(+) > > diff --git a/arch/arm64/include/asm/gunyah.h b/arch/arm64/include/asm/gunyah.h > new file mode 100644 > index 0000000000000..0cd3debe22b64 > --- /dev/null > +++ b/arch/arm64/include/asm/gunyah.h > @@ -0,0 +1,36 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved. > + */ > +#ifndef _ASM_GUNYAH_H > +#define _ASM_GUNYAH_H > + > +#include <linux/irq.h> > +#include <linux/irqdomain.h> > + > +static inline int arch_gunyah_fill_irq_fwspec_params(u32 virq, > + struct irq_fwspec *fwspec) > +{ > + /* Assume that Gunyah gave us an SPI or ESPI; defensively check it */ > + if (WARN(virq < 32, "Unexpected virq: %d\n", virq)) { > + return -EINVAL; > + } else if (virq <= 1019) { > + fwspec->param_count = 3; > + fwspec->param[0] = 0; /* GIC_SPI */ > + fwspec->param[1] = virq - 32; /* virq 32 -> SPI 0 */ > + fwspec->param[2] = IRQ_TYPE_EDGE_RISING; > + } else if (WARN(virq < 4096, "Unexpected virq: %d\n", virq)) { > + return -EINVAL; > + } else if (virq < 5120) { > + fwspec->param_count = 3; > + fwspec->param[0] = 2; /* GIC_ESPI */ > + fwspec->param[1] = virq - 4096; /* virq 4096 -> ESPI 0 */ > + fwspec->param[2] = IRQ_TYPE_EDGE_RISING; > + } else { > + WARN(1, "Unexpected virq: %d\n", virq); > + return -EINVAL; > + } > + return 0; > +} > + __get_intid_range() in gic-v3 driver looks more pleasing. Other than that the logic for the translation looks good to me. Thanks, Pavan
On Fri, Apr 05, 2024 at 08:40:40AM +0530, Pavan Kondeti wrote: > On Thu, Feb 22, 2024 at 03:16:34PM -0800, Elliot Berman wrote: > > When booting a Gunyah virtual machine, the host VM may gain capabilities > > to interact with resources for the guest virtual machine. Examples of > > such resources are vCPUs or message queues. To use those resources, we > > need to translate the RM response into a gunyah_resource structure which > > are useful to Linux drivers. Presently, Linux drivers need only to know > > the type of resource, the capability ID, and an interrupt. > > > > On ARM64 systems, the interrupt reported by Gunyah is the GIC interrupt > > ID number and always a SPI or extended SPI. > > > > Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> > > --- > > arch/arm64/include/asm/gunyah.h | 36 ++++++++++++++++++++++ > > drivers/virt/gunyah/rsc_mgr.c | 67 +++++++++++++++++++++++++++++++++++++++++ > > drivers/virt/gunyah/rsc_mgr.h | 5 +++ > > include/linux/gunyah.h | 2 ++ > > 4 files changed, 110 insertions(+) > > > > diff --git a/arch/arm64/include/asm/gunyah.h b/arch/arm64/include/asm/gunyah.h > > new file mode 100644 > > index 0000000000000..0cd3debe22b64 > > --- /dev/null > > +++ b/arch/arm64/include/asm/gunyah.h > > @@ -0,0 +1,36 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +/* > > + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved. > > + */ > > +#ifndef _ASM_GUNYAH_H > > +#define _ASM_GUNYAH_H > > + > > +#include <linux/irq.h> > > +#include <linux/irqdomain.h> > > + > > +static inline int arch_gunyah_fill_irq_fwspec_params(u32 virq, > > + struct irq_fwspec *fwspec) > > +{ > > + /* Assume that Gunyah gave us an SPI or ESPI; defensively check it */ > > + if (WARN(virq < 32, "Unexpected virq: %d\n", virq)) { > > + return -EINVAL; > > + } else if (virq <= 1019) { > > + fwspec->param_count = 3; > > + fwspec->param[0] = 0; /* GIC_SPI */ > > + fwspec->param[1] = virq - 32; /* virq 32 -> SPI 0 */ > > + fwspec->param[2] = IRQ_TYPE_EDGE_RISING; > > + } else if (WARN(virq < 4096, "Unexpected virq: %d\n", virq)) { > > + return -EINVAL; > > + } else if (virq < 5120) { > > + fwspec->param_count = 3; > > + fwspec->param[0] = 2; /* GIC_ESPI */ > > + fwspec->param[1] = virq - 4096; /* virq 4096 -> ESPI 0 */ > > + fwspec->param[2] = IRQ_TYPE_EDGE_RISING; > > + } else { > > + WARN(1, "Unexpected virq: %d\n", virq); > > + return -EINVAL; > > + } > > + return 0; > > +} > > + > > __get_intid_range() in gic-v3 driver looks more pleasing. Other than > that the logic for the translation looks good to me. Agreed, updated for v18. static inline int arch_gunyah_fill_irq_fwspec_params(u32 virq, struct irq_fwspec *fwspec) { /* Assume that Gunyah gave us an SPI or ESPI; defensively check it */ switch (virq) { case 32 ... 1019: fwspec->param_count = 3; fwspec->param[0] = 0; /* GIC_SPI */ fwspec->param[1] = virq - 32; /* virq 32 -> SPI 0 */ fwspec->param[2] = IRQ_TYPE_EDGE_RISING; break; case 4096 ... 5119: fwspec->param_count = 3; fwspec->param[0] = 2; /* GIC_ESPI */ fwspec->param[1] = virq - 4096; /* virq 4096 -> ESPI 0 */ fwspec->param[2] = IRQ_TYPE_EDGE_RISING; break; default: WARN(1, "Unexpected virq: %d\n", virq) return -EINVAL; } return 0; }
* Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:38]: > +/** > + * struct gunyah_vm_exit_info - Reason for VM exit as reported by Gunyah > + * See Gunyah documentation for values. > + * @type: Describes how VM exited > + * @padding: padding bytes > + * @reason_size: Number of bytes valid for `reason` > + * @reason: See Gunyah documentation for interpretation. Note: these values are > + * not interpreted by Linux and need to be converted from little-endian > + * as applicable. > + */ > +struct gunyah_vm_exit_info { > + __u16 type; Pls add an enum to describe the various exit types. > + __u16 padding; > + __u32 reason_size; > + __u8 reason[GUNYAH_VM_MAX_EXIT_REASON_SIZE]; > +}; - vatsa
On Wed, Apr 24, 2024 at 03:09:57PM +0530, Srivatsa Vaddagiri wrote: > * Elliot Berman <quic_eberman@quicinc.com> [2024-02-22 15:16:38]: > > > +/** > > + * struct gunyah_vm_exit_info - Reason for VM exit as reported by Gunyah > > + * See Gunyah documentation for values. > > + * @type: Describes how VM exited > > + * @padding: padding bytes > > + * @reason_size: Number of bytes valid for `reason` > > + * @reason: See Gunyah documentation for interpretation. Note: these values are > > + * not interpreted by Linux and need to be converted from little-endian > > + * as applicable. > > + */ > > +struct gunyah_vm_exit_info { > > + __u16 type; > > Pls add an enum to describe the various exit types. > When I was first added the exit_info, I thought a bunch on whether to add the enum. I was thinking that if we add it to the UAPI, kernel need to parse and validate the values from RM. In the current scheme, I can leave it to userspace to interpret the value from RM. I could be convinced to go the other way though. Currently this is less code :) Let me know your thoughts! > > + __u16 padding; > > + __u32 reason_size; > > + __u8 reason[GUNYAH_VM_MAX_EXIT_REASON_SIZE]; If we start interpreting the @type, then maybe we should also start interpreting the @reason, and that will also add debt to the kernel driver. Thanks, Elliot
Gunyah is a Type-1 hypervisor independent of any high-level OS kernel, and runs in a higher CPU privilege level. It does not depend on any lower-privileged OS kernel/code for its core functionality. This increases its security and can support a much smaller trusted computing base than a Type-2 hypervisor. Gunyah is designed for isolated virtual machine use cases and to support launching trusted+isolated virtual machines from a relatively less trusted host virtual machine. Gunyah is an open source hypervisor. The source repo is available at https://github.com/quic/gunyah-hypervisor. The diagram below shows the architecture for AArch64. :: VM A VM B +-----+ +-----+ | +-----+ +-----+ +-----+ | | | | | | | | | | | EL0 | APP | | APP | | | APP | | APP | | APP | | | | | | | | | | | | +-----+ +-----+ | +-----+ +-----+ +-----+ ---------------------|------------------------- +--------------+ | +----------------------+ | | | | | EL1 | Linux Kernel | | |Linux kernel/Other OS | ... | | | | | +--------------+ | +----------------------+ --------hvc/smc------|------hvc/smc------------ +----------------------------------------+ | | EL2 | Gunyah Hypervisor | | | +----------------------------------------+ Gunyah provides these following features. - Threads and Scheduling: The scheduler schedules virtual CPUs (VCPUs) on physical CPUs and enables time-sharing of the CPUs. - Memory Management: Gunyah tracks memory ownership and use of all memory under its control. It provides low level dynamic memory management APIs on top of which higher level donation, lending and sharing is built. Gunyah provides strong VM memory isolation for trusted VMs. - Interrupt Virtualization: Interrupts are managed by the hypervisor and are routed directly to the assigned VM. - Inter-VM Communication: There are several different mechanisms provided for communicating between VMs. - Device Virtualization: Para-virtualization of devices is supported using inter-VM communication and virtio primitives. Low level architecture features and devices such as cpu timers, interrupt controllers are supported with hardware virtualization and emulation where required. - Resource Manager: Gunyah supports a "root" VM that initially owns all VM memory and IO resources. The Gunyah Resource Manager is the default bundled root VM and provides high-level services including dynamic VM management and secure memory donation, lending and sharing. This series adds the basic framework for detecting that Linux is running under Gunyah as a virtual machine, communication with the Gunyah Resource Manager, and a sample virtual machine manager capable of launching virtual machines. Changes in v17: - Replace RM's irq_chip with irq_create_fwspec_mapping - Unmap lent memory from kernel logical map - Small optimization to unlock folio earlier to allow vCPUs racing for the same folio to run sooner - Add missed "safe to lend" checks for a folio when constructing mem parcel Changes in v16: https://lore.kernel.org/r/20240109-gunyah-v16-0-634904bf4ce9@quicinc.com - Fleshed out memory reclaim while VM is running - Documentation and comments Changes in v15: https://lore.kernel.org/r/20231215-gunyah-v15-0-192a5d872a30@quicinc.com - First implementation of virtual machines backed by guestmemfd and using demand paging to provide memory instead of all up front. - Use message queue hypercalls directly instead of traversing through mailbox framework. Changes in v14: https://lore.kernel.org/all/20230613172054.3959700-1-quic_eberman@quicinc.com/ - Coding/cosmetic tweaks suggested by Alex - Mark IRQs as wake-up capable Changes in v13: https://lore.kernel.org/all/20230509204801.2824351-1-quic_eberman@quicinc.com/ - Tweaks to message queue driver to address race condition between IRQ and mailbox registration - Allow removal of VM functions by function-specific comparison -- specifically to allow removing irqfd by label only and not requiring original FD to be provided. Changes in v12: https://lore.kernel.org/all/20230424231558.70911-1-quic_eberman@quicinc.com/ - Stylistic/cosmetic tweaks suggested by Alex - Remove patch "virt: gunyah: Identify hypervisor version" and squash the check that we're running under a reasonable Gunyah hypervisor into RM driver - Refactor platform hooks into a separate module per suggestion from Srini - GFP_KERNEL_ACCOUNT and account_locked_vm() for page pinning - enum-ify related constants Changes in v11: https://lore.kernel.org/all/20230304010632.2127470-1-quic_eberman@quicinc.com/ - Rename struct gh_vm_dtb_config:gpa -> guest_phys_addr & overflow checks for this - More docstrings throughout - Make resp_buf and resp_buf_size optional - Replace deprecated idr with xarray - Refconting on misc device instead of RM's platform device - Renaming variables, structs, etc. from gunyah_ -> gh_ - Drop removal of user mem regions - Drop mem_lend functionality; to converge with restricted_memfd later Changes in v10: https://lore.kernel.org/all/20230214211229.3239350-1-quic_eberman@quicinc.com/ - Fix bisectability (end result of series is same, --fixups applied to wrong commits) - Convert GH_ERROR_* and GH_RM_ERROR_* to enums - Correct race condition between allocating/freeing user memory - Replace offsetof with struct_size - Series-wide renaming of functions to be more consistent - VM shutdown & restart support added in vCPU and VM Manager patches - Convert VM function name (string) to type (number) - Convert VM function argument to value (which could be a pointer) to remove memory wastage for arguments - Remove defensive checks of hypervisor correctness - Clean ups to ioeventfd as suggested by Srivatsa Changes in v9: https://lore.kernel.org/all/20230120224627.4053418-1-quic_eberman@quicinc.com/ - Refactor Gunyah API flags to be exposed as feature flags at kernel level - Move mbox client cleanup into gunyah_msgq_remove() - Simplify gh_rm_call return value and response payload - Missing clean-up/error handling/little endian fixes as suggested by Srivatsa and Alex in v8 series Changes in v8: https://lore.kernel.org/all/20221219225850.2397345-1-quic_eberman@quicinc.com/ - Treat VM manager as a library of RM - Add patches 21-28 as RFC to support proxy-scheduled vCPUs and necessary bits to support virtio from Gunyah userspace Changes in v7: https://lore.kernel.org/all/20221121140009.2353512-1-quic_eberman@quicinc.com/ - Refactor to remove gunyah RM bus - Refactor allow multiple RM device instances - Bump UAPI to start at 0x0 - Refactor QCOM SCM's platform hooks to allow CONFIG_QCOM_SCM=Y/CONFIG_GUNYAH=M combinations Changes in v6: https://lore.kernel.org/all/20221026185846.3983888-1-quic_eberman@quicinc.com/ - *Replace gunyah-console with gunyah VM Manager* - Move include/asm-generic/gunyah.h into include/linux/gunyah.h - s/gunyah_msgq/gh_msgq/ - Minor tweaks and documentation tidying based on comments from Jiri, Greg, Arnd, Dmitry, and Bagas. Changes in v5 https://lore.kernel.org/all/20221011000840.289033-1-quic_eberman@quicinc.com/ - Dropped sysfs nodes - Switch from aux bus to Gunyah RM bus for the subdevices - Cleaning up RM console Changes in v4: https://lore.kernel.org/all/20220928195633.2348848-1-quic_eberman@quicinc.com/ - Tidied up documentation throughout based on questions/feedback received - Switched message queue implementation to use mailboxes - Renamed "gunyah_device" as "gunyah_resource" Changes in v3: https://lore.kernel.org/all/20220811214107.1074343-1-quic_eberman@quicinc.com/ - /Maintained/Supported/ in MAINTAINERS - Tidied up documentation throughout based on questions/feedback received - Moved hypercalls into arch/arm64/gunyah/; following hyper-v's implementation - Drop opaque typedefs - Move sysfs nodes under /sys/hypervisor/gunyah/ - Moved Gunyah console driver to drivers/tty/ - Reworked gh_device design to drop the Gunyah bus. Changes in v2: https://lore.kernel.org/all/20220801211240.597859-1-quic_eberman@quicinc.com/ - DT bindings clean up - Switch hypercalls to follow SMCCC v1: https://lore.kernel.org/all/20220223233729.1571114-1-quic_eberman@quicinc.com/ Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> --- Elliot Berman (35): docs: gunyah: Introduce Gunyah Hypervisor dt-bindings: Add binding for gunyah hypervisor gunyah: Common types and error codes for Gunyah hypercalls virt: gunyah: Add hypercalls to identify Gunyah virt: gunyah: Add hypervisor driver virt: gunyah: msgq: Add hypercalls to send and receive messages gunyah: rsc_mgr: Add resource manager RPC core gunyah: vm_mgr: Introduce basic VM Manager gunyah: rsc_mgr: Add VM lifecycle RPC gunyah: vm_mgr: Add VM start/stop virt: gunyah: Translate gh_rm_hyp_resource into gunyah_resource virt: gunyah: Add resource tickets gunyah: vm_mgr: Add framework for VM Functions virt: gunyah: Add hypercalls for running a vCPU virt: gunyah: Add proxy-scheduled vCPUs gunyah: Add hypercalls for demand paging gunyah: rsc_mgr: Add memory parcel RPC mm/interval_tree: Export iter_first/iter_next arch/mm: Export direct {un,}map functions virt: gunyah: Add interfaces to map memory into guest address space gunyah: rsc_mgr: Add platform ops on mem_lend/mem_reclaim virt: gunyah: Add Qualcomm Gunyah platform ops virt: gunyah: Implement guestmemfd virt: gunyah: Add ioctl to bind guestmem to VMs virt: gunyah: guestmem: Initialize RM mem parcels from guestmem virt: gunyah: Share guest VM dtb configuration to Gunyah gunyah: rsc_mgr: Add RPC to enable demand paging virt: gunyah: Enable demand paging gunyah: rsc_mgr: Add RPC to set VM boot context virt: gunyah: Allow userspace to initialize context of primary vCPU virt: gunyah: Add hypercalls for sending doorbell virt: gunyah: Add irqfd interface virt: gunyah: Add IO handlers virt: gunyah: Add ioeventfd MAINTAINERS: Add Gunyah hypervisor drivers section .../bindings/firmware/gunyah-hypervisor.yaml | 82 ++ Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + Documentation/virt/gunyah/index.rst | 135 +++ Documentation/virt/gunyah/message-queue.rst | 68 ++ Documentation/virt/index.rst | 1 + MAINTAINERS | 12 + arch/arm64/Kbuild | 1 + arch/arm64/gunyah/Makefile | 3 + arch/arm64/gunyah/gunyah_hypercall.c | 279 ++++++ arch/arm64/include/asm/gunyah.h | 57 ++ arch/arm64/mm/pageattr.c | 3 + drivers/virt/Kconfig | 2 + drivers/virt/Makefile | 1 + drivers/virt/gunyah/Kconfig | 47 + drivers/virt/gunyah/Makefile | 9 + drivers/virt/gunyah/guest_memfd.c | 987 ++++++++++++++++++++ drivers/virt/gunyah/gunyah.c | 52 ++ drivers/virt/gunyah/gunyah_ioeventfd.c | 139 +++ drivers/virt/gunyah/gunyah_irqfd.c | 187 ++++ drivers/virt/gunyah/gunyah_platform_hooks.c | 117 +++ drivers/virt/gunyah/gunyah_qcom.c | 220 +++++ drivers/virt/gunyah/gunyah_vcpu.c | 590 ++++++++++++ drivers/virt/gunyah/rsc_mgr.c | 836 +++++++++++++++++ drivers/virt/gunyah/rsc_mgr.h | 144 +++ drivers/virt/gunyah/rsc_mgr_rpc.c | 602 +++++++++++++ drivers/virt/gunyah/vm_mgr.c | 993 +++++++++++++++++++++ drivers/virt/gunyah/vm_mgr.h | 215 +++++ drivers/virt/gunyah/vm_mgr_mem.c | 356 ++++++++ include/linux/gunyah.h | 483 ++++++++++ include/uapi/linux/gunyah.h | 378 ++++++++ mm/interval_tree.c | 3 + 31 files changed, 7003 insertions(+) --- base-commit: ffd2cb6b718e189e7e2d5d0c19c25611f92e061a change-id: 20231208-gunyah-952aca7668e0 Best regards,