diff mbox series

[for-3.0,v2] pc: acpi: fix memory hotplug regression by reducing stub SRAT entry size

Message ID 1532943701-83614-1-git-send-email-imammedo@redhat.com
State New
Headers show
Series [for-3.0,v2] pc: acpi: fix memory hotplug regression by reducing stub SRAT entry size | expand

Commit Message

Igor Mammedov July 30, 2018, 9:41 a.m. UTC
Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
broke the first dimm hotplug in following cases:

 1: there is no coldplugged dimm in the last numa node
    but there is a coldplugged dimm in another node

  -m 4096,slots=4,maxmem=32G               \
  -object memory-backend-ram,id=m0,size=2G \
  -device pc-dimm,memdev=m0,node=0         \
  -numa node,nodeid=0                      \
  -numa node,nodeid=1

 2: if order of dimms on CLI is:
       1st plugged dimm in node1
       2nd plugged dimm in node0

  -m 4096,slots=4,maxmem=32G               \
  -object memory-backend-ram,size=2G,id=m0 \
  -device pc-dimm,memdev=m0,node=1         \
  -object memory-backend-ram,id=m1,size=2G \
  -device pc-dimm,memdev=m1,node=0         \
  -numa node,nodeid=0                      \
  -numa node,nodeid=1

(qemu) object_add memory-backend-ram,id=m2,size=1G
(qemu) device_add pc-dimm,memdev=m2,node=0

the first DIMM hotplug to any node except the last one
fails (Windows is unable to online it).

Length reduction of stub hotplug memory SRAT entry,
fixes issue for some reason.

RHBZ: 1609234

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
NOTE TO MAINTAINER: UPDATE REFERENCE APCI TABLE BLOBS

v2:
 fixup examples in commit message
 (they were in wrong order and with duplicate memdev=m1)
---
 hw/i386/acpi-build.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

Comments

Eduardo Habkost July 30, 2018, 8:26 p.m. UTC | #1
On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:
> Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> broke the first dimm hotplug in following cases:
> 
>  1: there is no coldplugged dimm in the last numa node
>     but there is a coldplugged dimm in another node
> 
>   -m 4096,slots=4,maxmem=32G               \
>   -object memory-backend-ram,id=m0,size=2G \
>   -device pc-dimm,memdev=m0,node=0         \
>   -numa node,nodeid=0                      \
>   -numa node,nodeid=1
> 
>  2: if order of dimms on CLI is:
>        1st plugged dimm in node1
>        2nd plugged dimm in node0
> 
>   -m 4096,slots=4,maxmem=32G               \
>   -object memory-backend-ram,size=2G,id=m0 \
>   -device pc-dimm,memdev=m0,node=1         \
>   -object memory-backend-ram,id=m1,size=2G \
>   -device pc-dimm,memdev=m1,node=0         \
>   -numa node,nodeid=0                      \
>   -numa node,nodeid=1
> 
> (qemu) object_add memory-backend-ram,id=m2,size=1G
> (qemu) device_add pc-dimm,memdev=m2,node=0
> 
> the first DIMM hotplug to any node except the last one
> fails (Windows is unable to online it).
> 
> Length reduction of stub hotplug memory SRAT entry,
> fixes issue for some reason.
> 

I'm really bothered by the lack of automated testing for all
these NUMA/ACPI corner cases.

This looks like a good candidate for an avocado_qemu test case.
Can you show pseudo-code of how exactly the bug fix could be
verified, so we can try to write a test case?


> RHBZ: 1609234

I suggest including full URL of bug report[1].  I doubt anybody
outside Red Hat knows what "RHBZ" means.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1609234

> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> NOTE TO MAINTAINER: UPDATE REFERENCE APCI TABLE BLOBS
> 
> v2:
>  fixup examples in commit message
>  (they were in wrong order and with duplicate memdev=m1)
> ---
>  hw/i386/acpi-build.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 9e8350c..b52fdb2 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2269,7 +2269,16 @@ static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
>          numamem = acpi_data_push(table_data, sizeof *numamem);
>  
>          if (!info) {
> -            build_srat_memory(numamem, cur, end - cur, default_node,
> +            /*
> +             * Entry is required for Windows to enable memory hotplug in OS
> +             * and for Linux to enable SWIOTLB when booted with less than
> +             * 4G of RAM. Windows works better if the entry sets proximity
> +             * to the highest NUMA node in the machine at the end of the
> +             * reserved space.
> +             * Memory devices may override proximity set by this entry,
> +             * providing _PXM method if necessary.
> +             */
> +            build_srat_memory(numamem, end - 1, 1, default_node,
>                                MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
>              break;
>          }
> @@ -2402,14 +2411,6 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>          build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS);
>      }
>  
> -    /*
> -     * Entry is required for Windows to enable memory hotplug in OS
> -     * and for Linux to enable SWIOTLB when booted with less than
> -     * 4G of RAM. Windows works better if the entry sets proximity
> -     * to the highest NUMA node in the machine.
> -     * Memory devices may override proximity set by this entry,
> -     * providing _PXM method if necessary.
> -     */
>      if (hotplugabble_address_space_size) {
>          build_srat_hotpluggable_memory(table_data, machine->device_memory->base,
>                                         hotplugabble_address_space_size,
> -- 
> 2.7.4
>
Igor Mammedov July 31, 2018, 9:53 a.m. UTC | #2
On Mon, 30 Jul 2018 17:26:24 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:
> > Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> > broke the first dimm hotplug in following cases:
> > 
> >  1: there is no coldplugged dimm in the last numa node
> >     but there is a coldplugged dimm in another node
> > 
> >   -m 4096,slots=4,maxmem=32G               \
> >   -object memory-backend-ram,id=m0,size=2G \
> >   -device pc-dimm,memdev=m0,node=0         \
> >   -numa node,nodeid=0                      \
> >   -numa node,nodeid=1
> > 
> >  2: if order of dimms on CLI is:
> >        1st plugged dimm in node1
> >        2nd plugged dimm in node0
> > 
> >   -m 4096,slots=4,maxmem=32G               \
> >   -object memory-backend-ram,size=2G,id=m0 \
> >   -device pc-dimm,memdev=m0,node=1         \
> >   -object memory-backend-ram,id=m1,size=2G \
> >   -device pc-dimm,memdev=m1,node=0         \
> >   -numa node,nodeid=0                      \
> >   -numa node,nodeid=1
> > 
> > (qemu) object_add memory-backend-ram,id=m2,size=1G
> > (qemu) device_add pc-dimm,memdev=m2,node=0
> > 
> > the first DIMM hotplug to any node except the last one
> > fails (Windows is unable to online it).
> > 
> > Length reduction of stub hotplug memory SRAT entry,
> > fixes issue for some reason.
> >   
> 
> I'm really bothered by the lack of automated testing for all
> these NUMA/ACPI corner cases.
> 
> This looks like a good candidate for an avocado_qemu test case.
> Can you show pseudo-code of how exactly the bug fix could be
> verified, so we can try to write a test case?
Sadly I do it manually every time I'm suspect a patch would
affect the feature. On just has to check if a new memory device
appeared in device manager and it is in working state (started successfully).
One also need to run it against to test it against windows version
that supports memory hot-add (DC ed.).

It's typically what RHEL QE does, and they just found
a new case which wasn't on test list so proactive measures
wouldn't work here in any case as we didn't know about
this particular combination.

I'm not sure how it will work with upstream avocado though,
windows testing implies tester would need access to MSDN
subscription or multiple retail versions to test against.
So with windows it becomes expensive and complicated
hence I'd leave this job to QE which has resources and
upstream would benefit from downstream when a bug is found
(albeit it's a catch up game).

> > RHBZ: 1609234  
> 
> I suggest including full URL of bug report[1].  I doubt anybody
> outside Red Hat knows what "RHBZ" means.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1609234
Next time I'll do so

> 
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > NOTE TO MAINTAINER: UPDATE REFERENCE APCI TABLE BLOBS
> > 
> > v2:
> >  fixup examples in commit message
> >  (they were in wrong order and with duplicate memdev=m1)
> > ---
> >  hw/i386/acpi-build.c | 19 ++++++++++---------
> >  1 file changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 9e8350c..b52fdb2 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -2269,7 +2269,16 @@ static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
> >          numamem = acpi_data_push(table_data, sizeof *numamem);
> >  
> >          if (!info) {
> > -            build_srat_memory(numamem, cur, end - cur, default_node,
> > +            /*
> > +             * Entry is required for Windows to enable memory hotplug in OS
> > +             * and for Linux to enable SWIOTLB when booted with less than
> > +             * 4G of RAM. Windows works better if the entry sets proximity
> > +             * to the highest NUMA node in the machine at the end of the
> > +             * reserved space.
> > +             * Memory devices may override proximity set by this entry,
> > +             * providing _PXM method if necessary.
> > +             */
> > +            build_srat_memory(numamem, end - 1, 1, default_node,
> >                                MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> >              break;
> >          }
> > @@ -2402,14 +2411,6 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
> >          build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS);
> >      }
> >  
> > -    /*
> > -     * Entry is required for Windows to enable memory hotplug in OS
> > -     * and for Linux to enable SWIOTLB when booted with less than
> > -     * 4G of RAM. Windows works better if the entry sets proximity
> > -     * to the highest NUMA node in the machine.
> > -     * Memory devices may override proximity set by this entry,
> > -     * providing _PXM method if necessary.
> > -     */
> >      if (hotplugabble_address_space_size) {
> >          build_srat_hotpluggable_memory(table_data, machine->device_memory->base,
> >                                         hotplugabble_address_space_size,
> > -- 
> > 2.7.4
> >   
>
Eduardo Habkost July 31, 2018, 3:03 p.m. UTC | #3
On Tue, Jul 31, 2018 at 11:53:40AM +0200, Igor Mammedov wrote:
> On Mon, 30 Jul 2018 17:26:24 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:
> > > Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> > > broke the first dimm hotplug in following cases:
> > > 
> > >  1: there is no coldplugged dimm in the last numa node
> > >     but there is a coldplugged dimm in another node
> > > 
> > >   -m 4096,slots=4,maxmem=32G               \
> > >   -object memory-backend-ram,id=m0,size=2G \
> > >   -device pc-dimm,memdev=m0,node=0         \
> > >   -numa node,nodeid=0                      \
> > >   -numa node,nodeid=1
> > > 
> > >  2: if order of dimms on CLI is:
> > >        1st plugged dimm in node1
> > >        2nd plugged dimm in node0
> > > 
> > >   -m 4096,slots=4,maxmem=32G               \
> > >   -object memory-backend-ram,size=2G,id=m0 \
> > >   -device pc-dimm,memdev=m0,node=1         \
> > >   -object memory-backend-ram,id=m1,size=2G \
> > >   -device pc-dimm,memdev=m1,node=0         \
> > >   -numa node,nodeid=0                      \
> > >   -numa node,nodeid=1
> > > 
> > > (qemu) object_add memory-backend-ram,id=m2,size=1G
> > > (qemu) device_add pc-dimm,memdev=m2,node=0
> > > 
> > > the first DIMM hotplug to any node except the last one
> > > fails (Windows is unable to online it).
> > > 
> > > Length reduction of stub hotplug memory SRAT entry,
> > > fixes issue for some reason.
> > >   
> > 
> > I'm really bothered by the lack of automated testing for all
> > these NUMA/ACPI corner cases.
> > 
> > This looks like a good candidate for an avocado_qemu test case.
> > Can you show pseudo-code of how exactly the bug fix could be
> > verified, so we can try to write a test case?
> Sadly I do it manually every time I'm suspect a patch would
> affect the feature. On just has to check if a new memory device
> appeared in device manager and it is in working state (started successfully).
> One also need to run it against to test it against windows version
> that supports memory hot-add (DC ed.).
> 
> It's typically what RHEL QE does, and they just found
> a new case which wasn't on test list so proactive measures
> wouldn't work here in any case as we didn't know about
> this particular combination.
> 
> I'm not sure how it will work with upstream avocado though,
> windows testing implies tester would need access to MSDN
> subscription or multiple retail versions to test against.
> So with windows it becomes expensive and complicated
> hence I'd leave this job to QE which has resources and
> upstream would benefit from downstream when a bug is found
> (albeit it's a catch up game).

I don't mean functional testing of Windows guests.  I'm just
looking for a way we can ensure we won't reintroduce this
particular bug later.  We should be able to encode known
requirements of existing guest OSes in test code (especially the
undocumented requirements).

In other words, we need test code that will check if the entry
you are adding below is still present and contains the right
flags, so people won't remove it by mistake.


[...]
> > > @@ -2269,7 +2269,16 @@ static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
> > >          numamem = acpi_data_push(table_data, sizeof *numamem);
> > >  
> > >          if (!info) {
> > > -            build_srat_memory(numamem, cur, end - cur, default_node,
> > > +            /*
> > > +             * Entry is required for Windows to enable memory hotplug in OS
> > > +             * and for Linux to enable SWIOTLB when booted with less than
> > > +             * 4G of RAM. Windows works better if the entry sets proximity
> > > +             * to the highest NUMA node in the machine at the end of the
> > > +             * reserved space.
> > > +             * Memory devices may override proximity set by this entry,
> > > +             * providing _PXM method if necessary.
> > > +             */
> > > +            build_srat_memory(numamem, end - 1, 1, default_node,
> > >                                MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> > >              break;
> > >          }
Igor Mammedov Aug. 2, 2018, 10:09 a.m. UTC | #4
On Tue, 31 Jul 2018 12:03:22 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, Jul 31, 2018 at 11:53:40AM +0200, Igor Mammedov wrote:
> > On Mon, 30 Jul 2018 17:26:24 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:  
> > > > Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> > > > broke the first dimm hotplug in following cases:
> > > > 
> > > >  1: there is no coldplugged dimm in the last numa node
> > > >     but there is a coldplugged dimm in another node
> > > > 
> > > >   -m 4096,slots=4,maxmem=32G               \
> > > >   -object memory-backend-ram,id=m0,size=2G \
> > > >   -device pc-dimm,memdev=m0,node=0         \
> > > >   -numa node,nodeid=0                      \
> > > >   -numa node,nodeid=1
> > > > 
> > > >  2: if order of dimms on CLI is:
> > > >        1st plugged dimm in node1
> > > >        2nd plugged dimm in node0
> > > > 
> > > >   -m 4096,slots=4,maxmem=32G               \
> > > >   -object memory-backend-ram,size=2G,id=m0 \
> > > >   -device pc-dimm,memdev=m0,node=1         \
> > > >   -object memory-backend-ram,id=m1,size=2G \
> > > >   -device pc-dimm,memdev=m1,node=0         \
> > > >   -numa node,nodeid=0                      \
> > > >   -numa node,nodeid=1
> > > > 
> > > > (qemu) object_add memory-backend-ram,id=m2,size=1G
> > > > (qemu) device_add pc-dimm,memdev=m2,node=0
> > > > 
> > > > the first DIMM hotplug to any node except the last one
> > > > fails (Windows is unable to online it).
> > > > 
> > > > Length reduction of stub hotplug memory SRAT entry,
> > > > fixes issue for some reason.
> > > >     
> > > 
> > > I'm really bothered by the lack of automated testing for all
> > > these NUMA/ACPI corner cases.
> > > 
> > > This looks like a good candidate for an avocado_qemu test case.
> > > Can you show pseudo-code of how exactly the bug fix could be
> > > verified, so we can try to write a test case?  
> > Sadly I do it manually every time I'm suspect a patch would
> > affect the feature. On just has to check if a new memory device
> > appeared in device manager and it is in working state (started successfully).
> > One also need to run it against to test it against windows version
> > that supports memory hot-add (DC ed.).
> > 
> > It's typically what RHEL QE does, and they just found
> > a new case which wasn't on test list so proactive measures
> > wouldn't work here in any case as we didn't know about
> > this particular combination.
> > 
> > I'm not sure how it will work with upstream avocado though,
> > windows testing implies tester would need access to MSDN
> > subscription or multiple retail versions to test against.
> > So with windows it becomes expensive and complicated
> > hence I'd leave this job to QE which has resources and
> > upstream would benefit from downstream when a bug is found
> > (albeit it's a catch up game).  
> 
> I don't mean functional testing of Windows guests.  I'm just
> looking for a way we can ensure we won't reintroduce this
> particular bug later.  We should be able to encode known
> requirements of existing guest OSes in test code (especially the
> undocumented requirements).
>
> In other words, we need test code that will check if the entry
> you are adding below is still present and contains the right
> flags, so people won't remove it by mistake.
known requirements are described in acpi code comment and commit
messages and maintainer are supposed to check if a change showed
by bios test is valid and doesn't regress existing state.
Parsing SRAT in test and ensuring that the last entry hasn't changed
won't help, we already have this by doing comparison with reference
SRAT.

And if there is a change, the only thing that can somewhat verify
it is a functional test with windows (known combinations at
least). Some new sequence/combination might regress it again
(like one described in commit). An Avocado functional test running
windows(es) might help if it will test random startup/hotplug combinations,
run by someone with rights to use windows.

I think that once I've contributed cpu hotplug testcases to autotest
but then there appears a new test suite and then another.
I don't really feel nor have capacity to deal with it, if someone
contributes testcase to Avocado and tells me how to easily use it,
I'd gladly run it with windows guests I have access to
whenever I review/test a patch that might affect windows.

> 
> 
> [...]
> > > > @@ -2269,7 +2269,16 @@ static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
> > > >          numamem = acpi_data_push(table_data, sizeof *numamem);
> > > >  
> > > >          if (!info) {
> > > > -            build_srat_memory(numamem, cur, end - cur, default_node,
> > > > +            /*
> > > > +             * Entry is required for Windows to enable memory hotplug in OS
> > > > +             * and for Linux to enable SWIOTLB when booted with less than
> > > > +             * 4G of RAM. Windows works better if the entry sets proximity
> > > > +             * to the highest NUMA node in the machine at the end of the
> > > > +             * reserved space.
> > > > +             * Memory devices may override proximity set by this entry,
> > > > +             * providing _PXM method if necessary.
> > > > +             */
> > > > +            build_srat_memory(numamem, end - 1, 1, default_node,
> > > >                                MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> > > >              break;
> > > >          }  
>
Eduardo Habkost Aug. 2, 2018, 1:39 p.m. UTC | #5
On Thu, Aug 02, 2018 at 12:09:37PM +0200, Igor Mammedov wrote:
> On Tue, 31 Jul 2018 12:03:22 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, Jul 31, 2018 at 11:53:40AM +0200, Igor Mammedov wrote:
> > > On Mon, 30 Jul 2018 17:26:24 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:  
> > > > > Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> > > > > broke the first dimm hotplug in following cases:
> > > > > 
> > > > >  1: there is no coldplugged dimm in the last numa node
> > > > >     but there is a coldplugged dimm in another node
> > > > > 
> > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > >   -object memory-backend-ram,id=m0,size=2G \
> > > > >   -device pc-dimm,memdev=m0,node=0         \
> > > > >   -numa node,nodeid=0                      \
> > > > >   -numa node,nodeid=1
> > > > > 
> > > > >  2: if order of dimms on CLI is:
> > > > >        1st plugged dimm in node1
> > > > >        2nd plugged dimm in node0
> > > > > 
> > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > >   -object memory-backend-ram,size=2G,id=m0 \
> > > > >   -device pc-dimm,memdev=m0,node=1         \
> > > > >   -object memory-backend-ram,id=m1,size=2G \
> > > > >   -device pc-dimm,memdev=m1,node=0         \
> > > > >   -numa node,nodeid=0                      \
> > > > >   -numa node,nodeid=1
> > > > > 
> > > > > (qemu) object_add memory-backend-ram,id=m2,size=1G
> > > > > (qemu) device_add pc-dimm,memdev=m2,node=0
> > > > > 
> > > > > the first DIMM hotplug to any node except the last one
> > > > > fails (Windows is unable to online it).
> > > > > 
> > > > > Length reduction of stub hotplug memory SRAT entry,
> > > > > fixes issue for some reason.
> > > > >     
> > > > 
> > > > I'm really bothered by the lack of automated testing for all
> > > > these NUMA/ACPI corner cases.
> > > > 
> > > > This looks like a good candidate for an avocado_qemu test case.
> > > > Can you show pseudo-code of how exactly the bug fix could be
> > > > verified, so we can try to write a test case?  
> > > Sadly I do it manually every time I'm suspect a patch would
> > > affect the feature. On just has to check if a new memory device
> > > appeared in device manager and it is in working state (started successfully).
> > > One also need to run it against to test it against windows version
> > > that supports memory hot-add (DC ed.).
> > > 
> > > It's typically what RHEL QE does, and they just found
> > > a new case which wasn't on test list so proactive measures
> > > wouldn't work here in any case as we didn't know about
> > > this particular combination.
> > > 
> > > I'm not sure how it will work with upstream avocado though,
> > > windows testing implies tester would need access to MSDN
> > > subscription or multiple retail versions to test against.
> > > So with windows it becomes expensive and complicated
> > > hence I'd leave this job to QE which has resources and
> > > upstream would benefit from downstream when a bug is found
> > > (albeit it's a catch up game).  
> > 
> > I don't mean functional testing of Windows guests.  I'm just
> > looking for a way we can ensure we won't reintroduce this
> > particular bug later.  We should be able to encode known
> > requirements of existing guest OSes in test code (especially the
> > undocumented requirements).
> >
> > In other words, we need test code that will check if the entry
> > you are adding below is still present and contains the right
> > flags, so people won't remove it by mistake.
> known requirements are described in acpi code comment and commit
> messages and maintainer are supposed to check if a change showed
> by bios test is valid and doesn't regress existing state.

I really believe computers are better at that than humans.  If we
can't even describe to a computer how to look for mistakes, I
don't think we can expect humans to spot them.


> Parsing SRAT in test and ensuring that the last entry hasn't changed
> won't help, we already have this by doing comparison with reference
> SRAT.

Comparison against reference SRAT is useless when a patch is
expected to change the ACPI tables, which happens all the time.
I think we can do better than that.

> 
> And if there is a change, the only thing that can somewhat verify
> it is a functional test with windows (known combinations at
> least). Some new sequence/combination might regress it again
> (like one described in commit). An Avocado functional test running
> windows(es) might help if it will test random startup/hotplug combinations,
> run by someone with rights to use windows.
> 
> I think that once I've contributed cpu hotplug testcases to autotest
> but then there appears a new test suite and then another.
> I don't really feel nor have capacity to deal with it, if someone
> contributes testcase to Avocado and tells me how to easily use it,
> I'd gladly run it with windows guests I have access to
> whenever I review/test a patch that might affect windows.

Running tests with real guests is useful, but costly.  I would
like to be able to detect when we break known requirements
without running a guest.

I'm not asking you to write that test code right now, but I'm
trying to see if it's possible to do that and use it as reference
for testing future ACPI patches.  Let's see if I can enumerate
the known requirements for this stub entry that are not
documented in the ACPI spec:

1) Windows expect a stub entry with
   MEM_AFFINITY_HOTPLUGGABLE|MEM_AFFINITY_ENABLED at the end of
   the hotpluggable address space.
2) Linux won't enable SWIOTLB when booted with less 4GB unless
   we add a stub entry to cover the whole hotpluggable address
   space.
3) Windows expect that entry to be bound to the last NUMA node.
4) Windows won't behave well if the stub entry covers the whole
   hotpluggable address space.  We just need to be at the end of
   the address space, leaving a hole between the last DIMM and
   this stub entry.  We add a 1-byte stub entry for that.

The last one is a new requirement we didn't know about, right?
Is my description accurate?  What else we know about Windows
expectations?


> [...]
> > > > @@ -2269,7 +2269,16 @@ static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
> > > >          numamem = acpi_data_push(table_data, sizeof *numamem);
> > > >  
> > > >          if (!info) {
> > > > -            build_srat_memory(numamem, cur, end - cur, default_node,
> > > > +            /*
> > > > +             * Entry is required for Windows to enable memory hotplug in OS
> > > > +             * and for Linux to enable SWIOTLB when booted with less than
> > > > +             * 4G of RAM. Windows works better if the entry sets proximity
> > > > +             * to the highest NUMA node in the machine at the end of the
> > > > +             * reserved space.
> > > > +             * Memory devices may override proximity set by this entry,
> > > > +             * providing _PXM method if necessary.
> > > > +             */
> > > > +            build_srat_memory(numamem, end - 1, 1, default_node,
> > > >                                MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> > > >              break;
> > > >          }
Igor Mammedov Aug. 9, 2018, 9:10 a.m. UTC | #6
On Thu, 2 Aug 2018 10:39:22 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Aug 02, 2018 at 12:09:37PM +0200, Igor Mammedov wrote:
> > On Tue, 31 Jul 2018 12:03:22 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Tue, Jul 31, 2018 at 11:53:40AM +0200, Igor Mammedov wrote:  
> > > > On Mon, 30 Jul 2018 17:26:24 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >     
> > > > > On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:    
> > > > > > Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> > > > > > broke the first dimm hotplug in following cases:
> > > > > > 
> > > > > >  1: there is no coldplugged dimm in the last numa node
> > > > > >     but there is a coldplugged dimm in another node
> > > > > > 
> > > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > > >   -object memory-backend-ram,id=m0,size=2G \
> > > > > >   -device pc-dimm,memdev=m0,node=0         \
> > > > > >   -numa node,nodeid=0                      \
> > > > > >   -numa node,nodeid=1
> > > > > > 
> > > > > >  2: if order of dimms on CLI is:
> > > > > >        1st plugged dimm in node1
> > > > > >        2nd plugged dimm in node0
> > > > > > 
> > > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > > >   -object memory-backend-ram,size=2G,id=m0 \
> > > > > >   -device pc-dimm,memdev=m0,node=1         \
> > > > > >   -object memory-backend-ram,id=m1,size=2G \
> > > > > >   -device pc-dimm,memdev=m1,node=0         \
> > > > > >   -numa node,nodeid=0                      \
> > > > > >   -numa node,nodeid=1
> > > > > > 
> > > > > > (qemu) object_add memory-backend-ram,id=m2,size=1G
> > > > > > (qemu) device_add pc-dimm,memdev=m2,node=0
> > > > > > 
> > > > > > the first DIMM hotplug to any node except the last one
> > > > > > fails (Windows is unable to online it).
> > > > > > 
> > > > > > Length reduction of stub hotplug memory SRAT entry,
> > > > > > fixes issue for some reason.
> > > > > >       
> > > > > 
> > > > > I'm really bothered by the lack of automated testing for all
> > > > > these NUMA/ACPI corner cases.
> > > > > 
> > > > > This looks like a good candidate for an avocado_qemu test case.
> > > > > Can you show pseudo-code of how exactly the bug fix could be
> > > > > verified, so we can try to write a test case?    
> > > > Sadly I do it manually every time I'm suspect a patch would
> > > > affect the feature. On just has to check if a new memory device
> > > > appeared in device manager and it is in working state (started successfully).
> > > > One also need to run it against to test it against windows version
> > > > that supports memory hot-add (DC ed.).
> > > > 
> > > > It's typically what RHEL QE does, and they just found
> > > > a new case which wasn't on test list so proactive measures
> > > > wouldn't work here in any case as we didn't know about
> > > > this particular combination.
> > > > 
> > > > I'm not sure how it will work with upstream avocado though,
> > > > windows testing implies tester would need access to MSDN
> > > > subscription or multiple retail versions to test against.
> > > > So with windows it becomes expensive and complicated
> > > > hence I'd leave this job to QE which has resources and
> > > > upstream would benefit from downstream when a bug is found
> > > > (albeit it's a catch up game).    
> > > 
> > > I don't mean functional testing of Windows guests.  I'm just
> > > looking for a way we can ensure we won't reintroduce this
> > > particular bug later.  We should be able to encode known
> > > requirements of existing guest OSes in test code (especially the
> > > undocumented requirements).
> > >
> > > In other words, we need test code that will check if the entry
> > > you are adding below is still present and contains the right
> > > flags, so people won't remove it by mistake.  
> > known requirements are described in acpi code comment and commit
> > messages and maintainer are supposed to check if a change showed
> > by bios test is valid and doesn't regress existing state.  
> 
> I really believe computers are better at that than humans.  If we
> can't even describe to a computer how to look for mistakes, I
> don't think we can expect humans to spot them.
> 
> 
> > Parsing SRAT in test and ensuring that the last entry hasn't changed
> > won't help, we already have this by doing comparison with reference
> > SRAT.  
> 
> Comparison against reference SRAT is useless when a patch is
> expected to change the ACPI tables, which happens all the time.
> I think we can do better than that.
> 
> > 
> > And if there is a change, the only thing that can somewhat verify
> > it is a functional test with windows (known combinations at
> > least). Some new sequence/combination might regress it again
> > (like one described in commit). An Avocado functional test running
> > windows(es) might help if it will test random startup/hotplug combinations,
> > run by someone with rights to use windows.
> > 
> > I think that once I've contributed cpu hotplug testcases to autotest
> > but then there appears a new test suite and then another.
> > I don't really feel nor have capacity to deal with it, if someone
> > contributes testcase to Avocado and tells me how to easily use it,
> > I'd gladly run it with windows guests I have access to
> > whenever I review/test a patch that might affect windows.  
> 
> Running tests with real guests is useful, but costly.  I would
> like to be able to detect when we break known requirements
> without running a guest.
> 
> I'm not asking you to write that test code right now, but I'm
> trying to see if it's possible to do that and use it as reference
> for testing future ACPI patches.  Let's see if I can enumerate
> the known requirements for this stub entry that are not
> documented in the ACPI spec:
> 
> 1) Windows expect a stub entry with
>    MEM_AFFINITY_HOTPLUGGABLE|MEM_AFFINITY_ENABLED at the end of
>    the hotpluggable address space.
> 2) Linux won't enable SWIOTLB when booted with less 4GB unless
>    we add a stub entry to cover the whole hotpluggable address
>    space.
> 3) Windows expect that entry to be bound to the last NUMA node.
> 4) Windows won't behave well if the stub entry covers the whole
>    hotpluggable address space.  We just need to be at the end of
>    the address space, leaving a hole between the last DIMM and
>    this stub entry.  We add a 1-byte stub entry for that.
> 
> The last one is a new requirement we didn't know about, right?
> Is my description accurate?  What else we know about Windows
> expectations?
that's all we know now. But to make this work from unit test
one would need to write SRAT table parser to locate and check
this entry.

Anyways for me this would be test won't do much good beside of
telling that entry has changed somehow (which existing SRAT diff
already does). So it's the same warning from maintainer pov,
as I'd need to run real guest tests to ack or veto the change.
Even doing the later doesn't guarantee that it's a safe change
as 848a1cc1e readily demonstrates.
 
[...]
Eduardo Habkost Aug. 9, 2018, 4:10 p.m. UTC | #7
On Thu, Aug 09, 2018 at 11:10:54AM +0200, Igor Mammedov wrote:
> On Thu, 2 Aug 2018 10:39:22 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Thu, Aug 02, 2018 at 12:09:37PM +0200, Igor Mammedov wrote:
> > > On Tue, 31 Jul 2018 12:03:22 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Tue, Jul 31, 2018 at 11:53:40AM +0200, Igor Mammedov wrote:  
> > > > > On Mon, 30 Jul 2018 17:26:24 -0300
> > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > > >     
> > > > > > On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:    
> > > > > > > Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> > > > > > > broke the first dimm hotplug in following cases:
> > > > > > > 
> > > > > > >  1: there is no coldplugged dimm in the last numa node
> > > > > > >     but there is a coldplugged dimm in another node
> > > > > > > 
> > > > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > > > >   -object memory-backend-ram,id=m0,size=2G \
> > > > > > >   -device pc-dimm,memdev=m0,node=0         \
> > > > > > >   -numa node,nodeid=0                      \
> > > > > > >   -numa node,nodeid=1
> > > > > > > 
> > > > > > >  2: if order of dimms on CLI is:
> > > > > > >        1st plugged dimm in node1
> > > > > > >        2nd plugged dimm in node0
> > > > > > > 
> > > > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > > > >   -object memory-backend-ram,size=2G,id=m0 \
> > > > > > >   -device pc-dimm,memdev=m0,node=1         \
> > > > > > >   -object memory-backend-ram,id=m1,size=2G \
> > > > > > >   -device pc-dimm,memdev=m1,node=0         \
> > > > > > >   -numa node,nodeid=0                      \
> > > > > > >   -numa node,nodeid=1
> > > > > > > 
> > > > > > > (qemu) object_add memory-backend-ram,id=m2,size=1G
> > > > > > > (qemu) device_add pc-dimm,memdev=m2,node=0
> > > > > > > 
> > > > > > > the first DIMM hotplug to any node except the last one
> > > > > > > fails (Windows is unable to online it).
> > > > > > > 
> > > > > > > Length reduction of stub hotplug memory SRAT entry,
> > > > > > > fixes issue for some reason.
> > > > > > >       
> > > > > > 
> > > > > > I'm really bothered by the lack of automated testing for all
> > > > > > these NUMA/ACPI corner cases.
> > > > > > 
> > > > > > This looks like a good candidate for an avocado_qemu test case.
> > > > > > Can you show pseudo-code of how exactly the bug fix could be
> > > > > > verified, so we can try to write a test case?    
> > > > > Sadly I do it manually every time I'm suspect a patch would
> > > > > affect the feature. On just has to check if a new memory device
> > > > > appeared in device manager and it is in working state (started successfully).
> > > > > One also need to run it against to test it against windows version
> > > > > that supports memory hot-add (DC ed.).
> > > > > 
> > > > > It's typically what RHEL QE does, and they just found
> > > > > a new case which wasn't on test list so proactive measures
> > > > > wouldn't work here in any case as we didn't know about
> > > > > this particular combination.
> > > > > 
> > > > > I'm not sure how it will work with upstream avocado though,
> > > > > windows testing implies tester would need access to MSDN
> > > > > subscription or multiple retail versions to test against.
> > > > > So with windows it becomes expensive and complicated
> > > > > hence I'd leave this job to QE which has resources and
> > > > > upstream would benefit from downstream when a bug is found
> > > > > (albeit it's a catch up game).    
> > > > 
> > > > I don't mean functional testing of Windows guests.  I'm just
> > > > looking for a way we can ensure we won't reintroduce this
> > > > particular bug later.  We should be able to encode known
> > > > requirements of existing guest OSes in test code (especially the
> > > > undocumented requirements).
> > > >
> > > > In other words, we need test code that will check if the entry
> > > > you are adding below is still present and contains the right
> > > > flags, so people won't remove it by mistake.  
> > > known requirements are described in acpi code comment and commit
> > > messages and maintainer are supposed to check if a change showed
> > > by bios test is valid and doesn't regress existing state.  
> > 
> > I really believe computers are better at that than humans.  If we
> > can't even describe to a computer how to look for mistakes, I
> > don't think we can expect humans to spot them.
> > 
> > 
> > > Parsing SRAT in test and ensuring that the last entry hasn't changed
> > > won't help, we already have this by doing comparison with reference
> > > SRAT.  
> > 
> > Comparison against reference SRAT is useless when a patch is
> > expected to change the ACPI tables, which happens all the time.
> > I think we can do better than that.
> > 
> > > 
> > > And if there is a change, the only thing that can somewhat verify
> > > it is a functional test with windows (known combinations at
> > > least). Some new sequence/combination might regress it again
> > > (like one described in commit). An Avocado functional test running
> > > windows(es) might help if it will test random startup/hotplug combinations,
> > > run by someone with rights to use windows.
> > > 
> > > I think that once I've contributed cpu hotplug testcases to autotest
> > > but then there appears a new test suite and then another.
> > > I don't really feel nor have capacity to deal with it, if someone
> > > contributes testcase to Avocado and tells me how to easily use it,
> > > I'd gladly run it with windows guests I have access to
> > > whenever I review/test a patch that might affect windows.  
> > 
> > Running tests with real guests is useful, but costly.  I would
> > like to be able to detect when we break known requirements
> > without running a guest.
> > 
> > I'm not asking you to write that test code right now, but I'm
> > trying to see if it's possible to do that and use it as reference
> > for testing future ACPI patches.  Let's see if I can enumerate
> > the known requirements for this stub entry that are not
> > documented in the ACPI spec:
> > 
> > 1) Windows expect a stub entry with
> >    MEM_AFFINITY_HOTPLUGGABLE|MEM_AFFINITY_ENABLED at the end of
> >    the hotpluggable address space.
> > 2) Linux won't enable SWIOTLB when booted with less 4GB unless
> >    we add a stub entry to cover the whole hotpluggable address
> >    space.
> > 3) Windows expect that entry to be bound to the last NUMA node.
> > 4) Windows won't behave well if the stub entry covers the whole
> >    hotpluggable address space.  We just need to be at the end of
> >    the address space, leaving a hole between the last DIMM and
> >    this stub entry.  We add a 1-byte stub entry for that.
> > 
> > The last one is a new requirement we didn't know about, right?
> > Is my description accurate?  What else we know about Windows
> > expectations?
> that's all we know now. But to make this work from unit test
> one would need to write SRAT table parser to locate and check
> this entry.

Thanks.  It's true that the test would have to parse the SRAT
table, so it's not as trivial as I would like to.

> 
> Anyways for me this would be test won't do much good beside of
> telling that entry has changed somehow (which existing SRAT diff
> already does). So it's the same warning from maintainer pov,
> as I'd need to run real guest tests to ack or veto the change.
> Even doing the later doesn't guarantee that it's a safe change
> as 848a1cc1e readily demonstrates.

Automated testing for specific non-documented requirements would
help a reviewer to decide if the change requires more extensive
testing.  The test output would carry more useful information
than just "SRAT has changed".
Igor Mammedov Aug. 10, 2018, 3:52 p.m. UTC | #8
On Thu, 9 Aug 2018 13:10:44 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Aug 09, 2018 at 11:10:54AM +0200, Igor Mammedov wrote:
> > On Thu, 2 Aug 2018 10:39:22 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Thu, Aug 02, 2018 at 12:09:37PM +0200, Igor Mammedov wrote:  
> > > > On Tue, 31 Jul 2018 12:03:22 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >     
> > > > > On Tue, Jul 31, 2018 at 11:53:40AM +0200, Igor Mammedov wrote:    
> > > > > > On Mon, 30 Jul 2018 17:26:24 -0300
> > > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > > > >       
> > > > > > > On Mon, Jul 30, 2018 at 11:41:41AM +0200, Igor Mammedov wrote:      
> > > > > > > > Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices)
> > > > > > > > broke the first dimm hotplug in following cases:
> > > > > > > > 
> > > > > > > >  1: there is no coldplugged dimm in the last numa node
> > > > > > > >     but there is a coldplugged dimm in another node
> > > > > > > > 
> > > > > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > > > > >   -object memory-backend-ram,id=m0,size=2G \
> > > > > > > >   -device pc-dimm,memdev=m0,node=0         \
> > > > > > > >   -numa node,nodeid=0                      \
> > > > > > > >   -numa node,nodeid=1
> > > > > > > > 
> > > > > > > >  2: if order of dimms on CLI is:
> > > > > > > >        1st plugged dimm in node1
> > > > > > > >        2nd plugged dimm in node0
> > > > > > > > 
> > > > > > > >   -m 4096,slots=4,maxmem=32G               \
> > > > > > > >   -object memory-backend-ram,size=2G,id=m0 \
> > > > > > > >   -device pc-dimm,memdev=m0,node=1         \
> > > > > > > >   -object memory-backend-ram,id=m1,size=2G \
> > > > > > > >   -device pc-dimm,memdev=m1,node=0         \
> > > > > > > >   -numa node,nodeid=0                      \
> > > > > > > >   -numa node,nodeid=1
> > > > > > > > 
> > > > > > > > (qemu) object_add memory-backend-ram,id=m2,size=1G
> > > > > > > > (qemu) device_add pc-dimm,memdev=m2,node=0
> > > > > > > > 
> > > > > > > > the first DIMM hotplug to any node except the last one
> > > > > > > > fails (Windows is unable to online it).
> > > > > > > > 
> > > > > > > > Length reduction of stub hotplug memory SRAT entry,
> > > > > > > > fixes issue for some reason.
> > > > > > > >         
> > > > > > > 
> > > > > > > I'm really bothered by the lack of automated testing for all
> > > > > > > these NUMA/ACPI corner cases.
> > > > > > > 
> > > > > > > This looks like a good candidate for an avocado_qemu test case.
> > > > > > > Can you show pseudo-code of how exactly the bug fix could be
> > > > > > > verified, so we can try to write a test case?      
> > > > > > Sadly I do it manually every time I'm suspect a patch would
> > > > > > affect the feature. On just has to check if a new memory device
> > > > > > appeared in device manager and it is in working state (started successfully).
> > > > > > One also need to run it against to test it against windows version
> > > > > > that supports memory hot-add (DC ed.).
> > > > > > 
> > > > > > It's typically what RHEL QE does, and they just found
> > > > > > a new case which wasn't on test list so proactive measures
> > > > > > wouldn't work here in any case as we didn't know about
> > > > > > this particular combination.
> > > > > > 
> > > > > > I'm not sure how it will work with upstream avocado though,
> > > > > > windows testing implies tester would need access to MSDN
> > > > > > subscription or multiple retail versions to test against.
> > > > > > So with windows it becomes expensive and complicated
> > > > > > hence I'd leave this job to QE which has resources and
> > > > > > upstream would benefit from downstream when a bug is found
> > > > > > (albeit it's a catch up game).      
> > > > > 
> > > > > I don't mean functional testing of Windows guests.  I'm just
> > > > > looking for a way we can ensure we won't reintroduce this
> > > > > particular bug later.  We should be able to encode known
> > > > > requirements of existing guest OSes in test code (especially the
> > > > > undocumented requirements).
> > > > >
> > > > > In other words, we need test code that will check if the entry
> > > > > you are adding below is still present and contains the right
> > > > > flags, so people won't remove it by mistake.    
> > > > known requirements are described in acpi code comment and commit
> > > > messages and maintainer are supposed to check if a change showed
> > > > by bios test is valid and doesn't regress existing state.    
> > > 
> > > I really believe computers are better at that than humans.  If we
> > > can't even describe to a computer how to look for mistakes, I
> > > don't think we can expect humans to spot them.
> > > 
> > >   
> > > > Parsing SRAT in test and ensuring that the last entry hasn't changed
> > > > won't help, we already have this by doing comparison with reference
> > > > SRAT.    
> > > 
> > > Comparison against reference SRAT is useless when a patch is
> > > expected to change the ACPI tables, which happens all the time.
> > > I think we can do better than that.
> > >   
> > > > 
> > > > And if there is a change, the only thing that can somewhat verify
> > > > it is a functional test with windows (known combinations at
> > > > least). Some new sequence/combination might regress it again
> > > > (like one described in commit). An Avocado functional test running
> > > > windows(es) might help if it will test random startup/hotplug combinations,
> > > > run by someone with rights to use windows.
> > > > 
> > > > I think that once I've contributed cpu hotplug testcases to autotest
> > > > but then there appears a new test suite and then another.
> > > > I don't really feel nor have capacity to deal with it, if someone
> > > > contributes testcase to Avocado and tells me how to easily use it,
> > > > I'd gladly run it with windows guests I have access to
> > > > whenever I review/test a patch that might affect windows.    
> > > 
> > > Running tests with real guests is useful, but costly.  I would
> > > like to be able to detect when we break known requirements
> > > without running a guest.
> > > 
> > > I'm not asking you to write that test code right now, but I'm
> > > trying to see if it's possible to do that and use it as reference
> > > for testing future ACPI patches.  Let's see if I can enumerate
> > > the known requirements for this stub entry that are not
> > > documented in the ACPI spec:
> > > 
> > > 1) Windows expect a stub entry with
> > >    MEM_AFFINITY_HOTPLUGGABLE|MEM_AFFINITY_ENABLED at the end of
> > >    the hotpluggable address space.
> > > 2) Linux won't enable SWIOTLB when booted with less 4GB unless
> > >    we add a stub entry to cover the whole hotpluggable address
> > >    space.
> > > 3) Windows expect that entry to be bound to the last NUMA node.
> > > 4) Windows won't behave well if the stub entry covers the whole
> > >    hotpluggable address space.  We just need to be at the end of
> > >    the address space, leaving a hole between the last DIMM and
> > >    this stub entry.  We add a 1-byte stub entry for that.
> > > 
> > > The last one is a new requirement we didn't know about, right?
> > > Is my description accurate?  What else we know about Windows
> > > expectations?
this requirements might be not complete, though.
It seems that 2.6 kernel isn't happy about 1 byte entry at the end
and discards SRAT, but restriction was removed since 3.0

So I'll try to find a middle ground that will work for old/new
kernels and windows.


> > that's all we know now. But to make this work from unit test
> > one would need to write SRAT table parser to locate and check
> > this entry.  
> 
> Thanks.  It's true that the test would have to parse the SRAT
> table, so it's not as trivial as I would like to.
> 
> > 
> > Anyways for me this would be test won't do much good beside of
> > telling that entry has changed somehow (which existing SRAT diff
> > already does). So it's the same warning from maintainer pov,
> > as I'd need to run real guest tests to ack or veto the change.
> > Even doing the later doesn't guarantee that it's a safe change
> > as 848a1cc1e readily demonstrates.  
> 
> Automated testing for specific non-documented requirements would
> help a reviewer to decide if the change requires more extensive
> testing.  The test output would carry more useful information
> than just "SRAT has changed".
It shows diff of how it's changed so reviewer should ask if change is
valid and to answer it one should do a lot of testing.
But if it would make a reviewer happier to see a better message that
SRAT has changed it is fine as well. Only requirements appear to be
a moving target.
diff mbox series

Patch

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9e8350c..b52fdb2 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2269,7 +2269,16 @@  static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
         numamem = acpi_data_push(table_data, sizeof *numamem);
 
         if (!info) {
-            build_srat_memory(numamem, cur, end - cur, default_node,
+            /*
+             * Entry is required for Windows to enable memory hotplug in OS
+             * and for Linux to enable SWIOTLB when booted with less than
+             * 4G of RAM. Windows works better if the entry sets proximity
+             * to the highest NUMA node in the machine at the end of the
+             * reserved space.
+             * Memory devices may override proximity set by this entry,
+             * providing _PXM method if necessary.
+             */
+            build_srat_memory(numamem, end - 1, 1, default_node,
                               MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
             break;
         }
@@ -2402,14 +2411,6 @@  build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
         build_srat_memory(numamem, 0, 0, 0, MEM_AFFINITY_NOFLAGS);
     }
 
-    /*
-     * Entry is required for Windows to enable memory hotplug in OS
-     * and for Linux to enable SWIOTLB when booted with less than
-     * 4G of RAM. Windows works better if the entry sets proximity
-     * to the highest NUMA node in the machine.
-     * Memory devices may override proximity set by this entry,
-     * providing _PXM method if necessary.
-     */
     if (hotplugabble_address_space_size) {
         build_srat_hotpluggable_memory(table_data, machine->device_memory->base,
                                        hotplugabble_address_space_size,