Message ID | 1440387111-23689-1-git-send-email-bharata@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On 08/24/2015 09:01 AM, Bharata B Rao wrote: > The hash table size allocated to guest depends on the maxmem size. > If the host isn't able to allocate the required hash table size but > instead allocates less than the optimal requested size, then it will > not be possible to grow the RAM until maxmem via memory hotplug. > Attempts to hotplug memory till maxmem could fail and this failure > isn't being currently handled gracefully by the guest kernel thereby > causing guest kernel oops. > > This should eventually get fixed when we move to completely in-kernel > memory hotplug instead of the current method where userspace tool drmgr > drives the hotplug. Until the in-kernel memory hotplug is available > for PowerKVM, disable memory hotplug when requested hash table size > isn't allocated. Even when the in-kernel memory hotplug will be available on PKVM, it still makes sense to disable memory hotplug when the hash table size received from host is not sufficient for the permissible/ requested maximum memory size of the guest. Whats the point of enabling memory hotplug when we know it cannot fulfill all the memory hotplug request. IIUC, the hash table size received from the host some times can be greater than what is required for the current memory size and less than max hot pluggable memory on the guest. With this patch in that case we will disable memory hotplug but then why use hash table size which is bigger than required for the current memory size. We will not be doing *any* memory hotplug at all afterwards, so lets shrink the hash page size to what is just required for the current memory requested by the guest and save some RAM on the system.
On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote: > The hash table size allocated to guest depends on the maxmem size. > If the host isn't able to allocate the required hash table size but > instead allocates less than the optimal requested size, then it will > not be possible to grow the RAM until maxmem via memory hotplug. > Attempts to hotplug memory till maxmem could fail and this failure > isn't being currently handled gracefully by the guest kernel thereby > causing guest kernel oops. > > This should eventually get fixed when we move to completely in-kernel > memory hotplug instead of the current method where userspace tool drmgr > drives the hotplug. Until the in-kernel memory hotplug is available > for PowerKVM, disable memory hotplug when requested hash table size > isn't allocated. David - Do you have any views on how to go about this ? Due to the way we do hotplug currently using drmgr, it appears that it is very difficult to have a graceful recovery within the guest kernel when memory hotplug request can't be fulfilled due to insufficient HTAB size. (Anshuman can elaborate on this with the exact description on why it is so hard to recover). Do you think disabling memory hotplug upfront is a reasonable workaround for this problem ? Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you be exporting something for the userspace (capability ?) to check and determine the presense of in-kernel memory hotplug feature so that we can depend on graceful recovery instead of upfront disablement of memory hotplug from QEMU ? Regards, Bharata.
On Wed, Sep 02, 2015 at 08:58:54AM +0530, Bharata B Rao wrote: > On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote: > > The hash table size allocated to guest depends on the maxmem size. > > If the host isn't able to allocate the required hash table size but > > instead allocates less than the optimal requested size, then it will > > not be possible to grow the RAM until maxmem via memory hotplug. > > Attempts to hotplug memory till maxmem could fail and this failure > > isn't being currently handled gracefully by the guest kernel thereby > > causing guest kernel oops. > > > > This should eventually get fixed when we move to completely in-kernel > > memory hotplug instead of the current method where userspace tool drmgr > > drives the hotplug. Until the in-kernel memory hotplug is available > > for PowerKVM, disable memory hotplug when requested hash table size > > isn't allocated. > > David - Do you have any views on how to go about this ? Due to the way > we do hotplug currently using drmgr, it appears that it is very difficult > to have a graceful recovery within the guest kernel when memory hotplug > request can't be fulfilled due to insufficient HTAB size. (Anshuman can > elaborate on this with the exact description on why it is so hard to > recover). > > Do you think disabling memory hotplug upfront is a reasonable workaround > for this problem ? > > Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you > be exporting something for the userspace (capability ?) to check and > determine the presense of in-kernel memory hotplug feature so that we > can depend on graceful recovery instead of upfront disablement of > memory hotplug from QEMU ? So, I kind of dislike magically disabling requested options - it can make debugging problems really confusing. In theory, what I'd prefer is to just not start the guest if we don't get a big enough hash table to cover maxram. Unfortunately we don't discover this until reset time at which point it is not straightforward to bail out cleanly :/
On 09/01/2015 10:28 PM, Bharata B Rao wrote: > On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote: >> The hash table size allocated to guest depends on the maxmem size. >> If the host isn't able to allocate the required hash table size but >> instead allocates less than the optimal requested size, then it will >> not be possible to grow the RAM until maxmem via memory hotplug. >> Attempts to hotplug memory till maxmem could fail and this failure >> isn't being currently handled gracefully by the guest kernel thereby >> causing guest kernel oops. >> >> This should eventually get fixed when we move to completely in-kernel >> memory hotplug instead of the current method where userspace tool drmgr >> drives the hotplug. Until the in-kernel memory hotplug is available >> for PowerKVM, disable memory hotplug when requested hash table size >> isn't allocated. > > David - Do you have any views on how to go about this ? Due to the way > we do hotplug currently using drmgr, it appears that it is very difficult > to have a graceful recovery within the guest kernel when memory hotplug > request can't be fulfilled due to insufficient HTAB size. (Anshuman can > elaborate on this with the exact description on why it is so hard to > recover). > > Do you think disabling memory hotplug upfront is a reasonable workaround > for this problem ? > > Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you > be exporting something for the userspace (capability ?) to check and > determine the presense of in-kernel memory hotplug feature so that we > can depend on graceful recovery instead of upfront disablement of > memory hotplug from QEMU ? > I did not have any plans currently to export something indicating we are using the in-kernel memory hotplug code. Perhaps this is something we should consider adding the to the PAPR update proposal that is being worked? Something to indicate we can gracefully handle adding memory beyond HTAB size. -Nathan
Quoting Nathan Fontenot (2015-09-03 13:50:59) > On 09/01/2015 10:28 PM, Bharata B Rao wrote: > > On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote: > >> The hash table size allocated to guest depends on the maxmem size. > >> If the host isn't able to allocate the required hash table size but > >> instead allocates less than the optimal requested size, then it will > >> not be possible to grow the RAM until maxmem via memory hotplug. > >> Attempts to hotplug memory till maxmem could fail and this failure > >> isn't being currently handled gracefully by the guest kernel thereby > >> causing guest kernel oops. > >> > >> This should eventually get fixed when we move to completely in-kernel > >> memory hotplug instead of the current method where userspace tool drmgr > >> drives the hotplug. Until the in-kernel memory hotplug is available > >> for PowerKVM, disable memory hotplug when requested hash table size > >> isn't allocated. > > > > David - Do you have any views on how to go about this ? Due to the way > > we do hotplug currently using drmgr, it appears that it is very difficult > > to have a graceful recovery within the guest kernel when memory hotplug > > request can't be fulfilled due to insufficient HTAB size. (Anshuman can > > elaborate on this with the exact description on why it is so hard to > > recover). > > > > Do you think disabling memory hotplug upfront is a reasonable workaround > > for this problem ? > > > > Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you > > be exporting something for the userspace (capability ?) to check and > > determine the presense of in-kernel memory hotplug feature so that we > > can depend on graceful recovery instead of upfront disablement of > > memory hotplug from QEMU ? > > > > I did not have any plans currently to export something indicating we are > using the in-kernel memory hotplug code. > > Perhaps this is something we should consider adding the to the PAPR update > proposal that is being worked? Something to indicate we can gracefully handle > adding memory beyond HTAB size. That might make sense, but I'm curious what constitutes graceful recovery in this context. What can we do with in-kernel hotplug that's not possible with userspace tools? If it's graceful failure, is there really nothing that can be done by QEMU as the DRC level to get the same result? > > -Nathan >
On 09/04/2015 10:33 AM, Michael Roth wrote: > Quoting Nathan Fontenot (2015-09-03 13:50:59) >> On 09/01/2015 10:28 PM, Bharata B Rao wrote: >>> On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote: >>>> The hash table size allocated to guest depends on the maxmem size. >>>> If the host isn't able to allocate the required hash table size but >>>> instead allocates less than the optimal requested size, then it will >>>> not be possible to grow the RAM until maxmem via memory hotplug. >>>> Attempts to hotplug memory till maxmem could fail and this failure >>>> isn't being currently handled gracefully by the guest kernel thereby >>>> causing guest kernel oops. >>>> >>>> This should eventually get fixed when we move to completely in-kernel >>>> memory hotplug instead of the current method where userspace tool drmgr >>>> drives the hotplug. Until the in-kernel memory hotplug is available >>>> for PowerKVM, disable memory hotplug when requested hash table size >>>> isn't allocated. >>> >>> David - Do you have any views on how to go about this ? Due to the way >>> we do hotplug currently using drmgr, it appears that it is very difficult >>> to have a graceful recovery within the guest kernel when memory hotplug >>> request can't be fulfilled due to insufficient HTAB size. (Anshuman can >>> elaborate on this with the exact description on why it is so hard to >>> recover). >>> >>> Do you think disabling memory hotplug upfront is a reasonable workaround >>> for this problem ? >>> >>> Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you >>> be exporting something for the userspace (capability ?) to check and >>> determine the presense of in-kernel memory hotplug feature so that we >>> can depend on graceful recovery instead of upfront disablement of >>> memory hotplug from QEMU ? >>> >> >> I did not have any plans currently to export something indicating we are >> using the in-kernel memory hotplug code. >> >> Perhaps this is something we should consider adding the to the PAPR update >> proposal that is being worked? Something to indicate we can gracefully handle >> adding memory beyond HTAB size. > > That might make sense, but I'm curious what constitutes graceful > recovery in this context. What can we do with in-kernel hotplug that's not > possible with userspace tools? If it's graceful failure, is there really > nothing that can be done by QEMU as the DRC level to get the same > result? I don't have an answer for how to recover gracefully or if it will be possible. If/when we can determine how to do that my thought was to use the PAPR updates we are working on to indicate to QEMU that the guest is able to handle this situation. -Nathan
Quoting Nathan Fontenot (2015-09-04 10:49:18) > On 09/04/2015 10:33 AM, Michael Roth wrote: > > Quoting Nathan Fontenot (2015-09-03 13:50:59) > >> On 09/01/2015 10:28 PM, Bharata B Rao wrote: > >>> On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote: > >>>> The hash table size allocated to guest depends on the maxmem size. > >>>> If the host isn't able to allocate the required hash table size but > >>>> instead allocates less than the optimal requested size, then it will > >>>> not be possible to grow the RAM until maxmem via memory hotplug. > >>>> Attempts to hotplug memory till maxmem could fail and this failure > >>>> isn't being currently handled gracefully by the guest kernel thereby > >>>> causing guest kernel oops. > >>>> > >>>> This should eventually get fixed when we move to completely in-kernel > >>>> memory hotplug instead of the current method where userspace tool drmgr > >>>> drives the hotplug. Until the in-kernel memory hotplug is available > >>>> for PowerKVM, disable memory hotplug when requested hash table size > >>>> isn't allocated. > >>> > >>> David - Do you have any views on how to go about this ? Due to the way > >>> we do hotplug currently using drmgr, it appears that it is very difficult > >>> to have a graceful recovery within the guest kernel when memory hotplug > >>> request can't be fulfilled due to insufficient HTAB size. (Anshuman can > >>> elaborate on this with the exact description on why it is so hard to > >>> recover). > >>> > >>> Do you think disabling memory hotplug upfront is a reasonable workaround > >>> for this problem ? > >>> > >>> Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you > >>> be exporting something for the userspace (capability ?) to check and > >>> determine the presense of in-kernel memory hotplug feature so that we > >>> can depend on graceful recovery instead of upfront disablement of > >>> memory hotplug from QEMU ? > >>> > >> > >> I did not have any plans currently to export something indicating we are > >> using the in-kernel memory hotplug code. > >> > >> Perhaps this is something we should consider adding the to the PAPR update > >> proposal that is being worked? Something to indicate we can gracefully handle > >> adding memory beyond HTAB size. > > > > That might make sense, but I'm curious what constitutes graceful > > recovery in this context. What can we do with in-kernel hotplug that's not > > possible with userspace tools? If it's graceful failure, is there really > > nothing that can be done by QEMU as the DRC level to get the same > > result? > > I don't have an answer for how to recover gracefully or if it will be possible. Sorry, I meant it as a general question. Bharata mentioned Anshuman might have some further details? > If/when we can determine how to do that my thought was to use the PAPR updates > we are working on to indicate to QEMU that the guest is able to handle this > situation. Agreed, if it's something that only makes sense for updated guest kernels an architecture flag would be good. But if it's possible to do something compatible with existing guests that would be ideal. Not sure that's been ruled out yet. > > -Nathan >
On 09/04/2015 09:42 PM, Michael Roth wrote: > Quoting Nathan Fontenot (2015-09-04 10:49:18) >> On 09/04/2015 10:33 AM, Michael Roth wrote: >>> Quoting Nathan Fontenot (2015-09-03 13:50:59) >>>> On 09/01/2015 10:28 PM, Bharata B Rao wrote: >>>>> On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote: >>>>>> The hash table size allocated to guest depends on the maxmem size. >>>>>> If the host isn't able to allocate the required hash table size but >>>>>> instead allocates less than the optimal requested size, then it will >>>>>> not be possible to grow the RAM until maxmem via memory hotplug. >>>>>> Attempts to hotplug memory till maxmem could fail and this failure >>>>>> isn't being currently handled gracefully by the guest kernel thereby >>>>>> causing guest kernel oops. >>>>>> >>>>>> This should eventually get fixed when we move to completely in-kernel >>>>>> memory hotplug instead of the current method where userspace tool drmgr >>>>>> drives the hotplug. Until the in-kernel memory hotplug is available >>>>>> for PowerKVM, disable memory hotplug when requested hash table size >>>>>> isn't allocated. >>>>> >>>>> David - Do you have any views on how to go about this ? Due to the way >>>>> we do hotplug currently using drmgr, it appears that it is very difficult >>>>> to have a graceful recovery within the guest kernel when memory hotplug >>>>> request can't be fulfilled due to insufficient HTAB size. (Anshuman can >>>>> elaborate on this with the exact description on why it is so hard to >>>>> recover). >>>>> >>>>> Do you think disabling memory hotplug upfront is a reasonable workaround >>>>> for this problem ? >>>>> >>>>> Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you >>>>> be exporting something for the userspace (capability ?) to check and >>>>> determine the presense of in-kernel memory hotplug feature so that we >>>>> can depend on graceful recovery instead of upfront disablement of >>>>> memory hotplug from QEMU ? >>>>> >>>> >>>> I did not have any plans currently to export something indicating we are >>>> using the in-kernel memory hotplug code. >>>> >>>> Perhaps this is something we should consider adding the to the PAPR update >>>> proposal that is being worked? Something to indicate we can gracefully handle >>>> adding memory beyond HTAB size. >>> >>> That might make sense, but I'm curious what constitutes graceful >>> recovery in this context. What can we do with in-kernel hotplug that's not >>> possible with userspace tools? If it's graceful failure, is there really >>> nothing that can be done by QEMU as the DRC level to get the same >>> result? >> >> I don't have an answer for how to recover gracefully or if it will be possible. > > Sorry, I meant it as a general question. Bharata mentioned Anshuman might have > some further details? Graceful recovery in the kernel seems to be difficult (though I cannot say whether it is impossible) because of the way we have implemented the memory hotplug function with the help of the userspace tool called 'drmgr'. It has two distinct steps in which it achieve memory hotplug after receiving platform notification. (1) Update the /proc/ofdt (2) Write into /sys/devices/system/memory/probe Both of these above steps try to add the new memory block into the kernel (generic and arch specific representations). Now if the step (2) fails we restore /proc/ofdt value to the original state present before we started the hotplug operation. In short, this does not rollback all the changes we had done in step (2) and step (1) gracefully. One of the reasons being the fact that it happens in two distinct steps from user space. Had it been attempted through a single step, kernel would have right away reverted any changes before exiting back into the userspace. New in-kernel memory hotplug method follows this principle now.
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c3268c5..4a07a7d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -92,6 +92,9 @@ #define HTAB_SIZE(spapr) (1ULL << ((spapr)->htab_shift)) +/* TODO: Move this to sPAPRMachineState ? */ +static bool spapr_memory_hotplug_disabled; + static XICSState *try_create_xics(const char *type, int nr_servers, int nr_irqs, Error **errp) { @@ -983,6 +986,14 @@ static void spapr_reset_htab(sPAPRMachineState *spapr) if (shift > 0) { /* Kernel handles htab, we don't need to allocate one */ + if (shift != spapr->htab_shift) { + /* + * Disable memory hotplug since we didn't get the requested + * hash table size. + */ + spapr_memory_hotplug_disabled = true; + } + spapr->htab_shift = shift; kvmppc_kern_htab = true; @@ -2149,6 +2160,11 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev, return; } + if (spapr_memory_hotplug_disabled) { + error_setg(errp, "Insufficient HTAB size to support memory hotplug"); + return; + } + spapr_memory_plug(hotplug_dev, dev, node, errp); } }
The hash table size allocated to guest depends on the maxmem size. If the host isn't able to allocate the required hash table size but instead allocates less than the optimal requested size, then it will not be possible to grow the RAM until maxmem via memory hotplug. Attempts to hotplug memory till maxmem could fail and this failure isn't being currently handled gracefully by the guest kernel thereby causing guest kernel oops. This should eventually get fixed when we move to completely in-kernel memory hotplug instead of the current method where userspace tool drmgr drives the hotplug. Until the in-kernel memory hotplug is available for PowerKVM, disable memory hotplug when requested hash table size isn't allocated. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> --- Applies against spapr-next branch of David Gibson's tree. hw/ppc/spapr.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)