diff mbox series

[ovs-dev] automake: Fix race for atconfig during distclean.

Message ID 20250603164247.2972039-1-i.maximets@ovn.org
State Changes Requested
Delegated to: aaron conole
Headers show
Series [ovs-dev] automake: Fix race for atconfig during distclean. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/cirrus-robot fail cirrus build: failed

Commit Message

Ilya Maximets June 3, 2025, 4:42 p.m. UTC
The dependency graph of distclean target looks somewhat like this:

  distclean
  |
  +-- distclean-am
      |
      +-- clean-am
      |   |
      |   +-- clean-local
      |   +-- clean-generic
      |
      +-- distclean-generic

So, in a parallel build, clean-am and distclean-generic will be
executed in parallel.  distclean-generic among other things is
responsible for removing DISTCLEANFILES, which are necessary for the
'testsuite --clean' ran by clean-local.  This is causing random
distcheck failures in CI:

 test ! -f '../../tests/testsuite'
   || /bin/bash '../../tests/testsuite' -C tests --clean
 ...
 test -z "tests/atconfig tests/atlocal ..."
   || rm -f tests/atconfig tests/atlocal ...
 ...
 ../tests/testsuite: line 4112: ./atconfig: No such file or directory

Fix that by introducing an explicit dependency between two targets
that will force the testsuite clean to be executed before the generic
distribution file cleanup.

Some related old conversation:
  https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html

GNU make has a special 'target: | dep' syntax for order-only
dependencies.  But, unfortunately, it doesn't really work for default
automake targets, because as soon as automake sees them defined, it
assumes that we're re-implementing them and doesn't generate the
target content anymore.  So, if we try to add order-only dependency
for distclean-generic or distclean-am, we have to fully re-implement
them.  distclean-am seems like an easier candidate to re-implement.

There is another unfortunate side effect of clean-am not being in the
last step of the make.  manpages.mk is autogenerated, but necessary
for every target, since it contains parts of the definitions.  So,
clean-am removes it, and then it gets re-created for execution of
distcheck-custom.  We need to manually call clean-generic as the very
last step in order to remove this file.

distcheck will break if distclean doesn't clean everything it needs
to clean, so it should not be hard to catch issues if the default
automake targets ever change.

Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 tests/automake.mk | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Aaron Conole June 3, 2025, 9:14 p.m. UTC | #1
Ilya Maximets <i.maximets@ovn.org> writes:

> The dependency graph of distclean target looks somewhat like this:
>
>   distclean
>   |
>   +-- distclean-am
>       |
>       +-- clean-am
>       |   |
>       |   +-- clean-local
>       |   +-- clean-generic
>       |
>       +-- distclean-generic
>
> So, in a parallel build, clean-am and distclean-generic will be
> executed in parallel.  distclean-generic among other things is
> responsible for removing DISTCLEANFILES, which are necessary for the
> 'testsuite --clean' ran by clean-local.  This is causing random
> distcheck failures in CI:
>
>  test ! -f '../../tests/testsuite'
>    || /bin/bash '../../tests/testsuite' -C tests --clean
>  ...
>  test -z "tests/atconfig tests/atlocal ..."
>    || rm -f tests/atconfig tests/atlocal ...
>  ...
>  ../tests/testsuite: line 4112: ./atconfig: No such file or directory
>
> Fix that by introducing an explicit dependency between two targets
> that will force the testsuite clean to be executed before the generic
> distribution file cleanup.

I guess the issue is that we are using the testsuite script to cleanup
the testsuite subdirs.  Could we just delete the subdirs directly in
clean-local instead?  For example, we make the clean-local target look
like:

---------------------------

clean-testsuite-local:
	-rm -rf tests/testsuite.dir

clean-userspace-testsuite-local:
	-rm -rf tests/system-userspace-testsuite.dir

clean-kernel-testsuite-local:
	-rm -rf tests/system-kmod-testsuite.dir

clean-dpdk-testsuite-local:
	-rm -rf tests/system-dpdk-testsuite.dir

clean-afxdp-testsuite-local:
	-rm -rf tests/system-afxdp-testsuite.dir

clean-system-offloads-testsuite-local:
	-rm -rf tests/system-offloads-testsuite.dir

clean-system-tso-testsuite-local:
	-rm -rf tests/system-tso-testsuite.dir

clean-local: clean-testsuite-local clean-userspace-testsuite-local \
	clean-kernel-testsuite-local clean-dpdk-testsuite-local \
	clean-afxdp-testsuite-local clean-system-offloads-testsuite-local \
	clean-system-tso-testsuite-local

---------------------------

This would mean that we don't run the testsuite script that depends on
atconfig / atlocal (which is where I guess the issue is) and preserves
the ability to do 'parallel' distclean (for whatever that gives).
Moreover, it doesn't seem like as much of a hack (and the above is just
illustrative, because we can probably just add the testsuite directories
to the CLEAN_LOCAL var).

Hope I understood the race correctly.

> Some related old conversation:
>   https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html
>
> GNU make has a special 'target: | dep' syntax for order-only
> dependencies.  But, unfortunately, it doesn't really work for default
> automake targets, because as soon as automake sees them defined, it
> assumes that we're re-implementing them and doesn't generate the
> target content anymore.  So, if we try to add order-only dependency
> for distclean-generic or distclean-am, we have to fully re-implement
> them.  distclean-am seems like an easier candidate to re-implement.
>
> There is another unfortunate side effect of clean-am not being in the
> last step of the make.  manpages.mk is autogenerated, but necessary
> for every target, since it contains parts of the definitions.  So,
> clean-am removes it, and then it gets re-created for execution of
> distcheck-custom.  We need to manually call clean-generic as the very
> last step in order to remove this file.
>
> distcheck will break if distclean doesn't clean everything it needs
> to clean, so it should not be hard to catch issues if the default
> automake targets ever change.
>
> Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>  tests/automake.mk | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/tests/automake.mk b/tests/automake.mk
> index 59f538761..dbcb20dbb 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -361,6 +361,20 @@ check-dpdk: all
>  	set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
>  	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
>  
> +# An explicit ordering dependency to avoid race between removing the
> +# $(DISTCLEANFILES) and running the testsite clean.  Unfortunately, it means
> +# overriding the default distclean-am target.  Default distclean-am has all
> +# dependencies of distclean-custom, plus clean-am.  'manpages.mk' is generated
> +# every time any target is invoked, so since clean-am is no longer part of the
> +# last target, we have to call the clean-generic manually to remove the
> +# generated file.
> +distclean-am: clean-am
> +	@$(MAKE) distclean-custom
> +distclean-custom: distclean-compile distclean-generic distclean-hdr \
> +		distclean-libtool distclean-tags
> +	@$(MAKE) clean-generic
> +.PHONY: distclean-am distclean-custom
> +
>  clean-local:
>  	test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean
>
Ilya Maximets June 4, 2025, 7:44 a.m. UTC | #2
On 6/3/25 11:14 PM, Aaron Conole wrote:
> Ilya Maximets <i.maximets@ovn.org> writes:
> 
>> The dependency graph of distclean target looks somewhat like this:
>>
>>   distclean
>>   |
>>   +-- distclean-am
>>       |
>>       +-- clean-am
>>       |   |
>>       |   +-- clean-local
>>       |   +-- clean-generic
>>       |
>>       +-- distclean-generic
>>
>> So, in a parallel build, clean-am and distclean-generic will be
>> executed in parallel.  distclean-generic among other things is
>> responsible for removing DISTCLEANFILES, which are necessary for the
>> 'testsuite --clean' ran by clean-local.  This is causing random
>> distcheck failures in CI:
>>
>>  test ! -f '../../tests/testsuite'
>>    || /bin/bash '../../tests/testsuite' -C tests --clean
>>  ...
>>  test -z "tests/atconfig tests/atlocal ..."
>>    || rm -f tests/atconfig tests/atlocal ...
>>  ...
>>  ../tests/testsuite: line 4112: ./atconfig: No such file or directory
>>
>> Fix that by introducing an explicit dependency between two targets
>> that will force the testsuite clean to be executed before the generic
>> distribution file cleanup.
> 
> I guess the issue is that we are using the testsuite script to cleanup
> the testsuite subdirs.  Could we just delete the subdirs directly in
> clean-local instead?  For example, we make the clean-local target look
> like:
> 
> ---------------------------
> 
> clean-testsuite-local:
> 	-rm -rf tests/testsuite.dir
> 
> clean-userspace-testsuite-local:
> 	-rm -rf tests/system-userspace-testsuite.dir
> 
> clean-kernel-testsuite-local:
> 	-rm -rf tests/system-kmod-testsuite.dir
> 
> clean-dpdk-testsuite-local:
> 	-rm -rf tests/system-dpdk-testsuite.dir
> 
> clean-afxdp-testsuite-local:
> 	-rm -rf tests/system-afxdp-testsuite.dir
> 
> clean-system-offloads-testsuite-local:
> 	-rm -rf tests/system-offloads-testsuite.dir
> 
> clean-system-tso-testsuite-local:
> 	-rm -rf tests/system-tso-testsuite.dir
> 
> clean-local: clean-testsuite-local clean-userspace-testsuite-local \
> 	clean-kernel-testsuite-local clean-dpdk-testsuite-local \
> 	clean-afxdp-testsuite-local clean-system-offloads-testsuite-local \
> 	clean-system-tso-testsuite-local
> 
> ---------------------------
> 
> This would mean that we don't run the testsuite script that depends on
> atconfig / atlocal (which is where I guess the issue is) and preserves
> the ability to do 'parallel' distclean (for whatever that gives).

Dependencies of the regular clean and dependencies of the distclean-am are
still executed in parallel, but now it is all deps of the clean-am and then
all the deps of the distclean-am.  Just to clarify.

> Moreover, it doesn't seem like as much of a hack (and the above is just
> illustrative, because we can probably just add the testsuite directories
> to the CLEAN_LOCAL var).

testsuite --clean does a little more than just removal of the directory,
it also cleans up the log file (which is not in the directory) and makes
sure that it can actually remove the files by making sure everything inside
has rwx permissions, which is possible that not everything does.  We can
replicate this logic in the automake, sure, it doesn't sound hard to do.
But both duplicating the logic of autotest and re-implementing the automake
target are a bit hacky.  In the first case we'll need to adjust our make
files if autotest changes, in the second - if automake changes.  Both seem
equally unlikely.  So, I'm not sure which is better.  Re-implementing the
testsuite --clean seems indeed a bit easier to understand as everything
is implemented in the plain sight and not buried inside the automake.

Note: we're not actually running the clean for any other testsuite, except
for the base one, but we probably should.  It's not causing any  problems
today since these are not running during distcheck.
I'm also a little bit confused by the fact that we're building the test
suite scripts into srcdir...  they should be in a build directory.

I'll try to look what it takes to re-implement the testsuite --clean, but
it will be a larger change for sure.

Best regards, Ilya Maximets.

> 
> Hope I understood the race correctly.
> 
>> Some related old conversation:
>>   https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html
>>
>> GNU make has a special 'target: | dep' syntax for order-only
>> dependencies.  But, unfortunately, it doesn't really work for default
>> automake targets, because as soon as automake sees them defined, it
>> assumes that we're re-implementing them and doesn't generate the
>> target content anymore.  So, if we try to add order-only dependency
>> for distclean-generic or distclean-am, we have to fully re-implement
>> them.  distclean-am seems like an easier candidate to re-implement.
>>
>> There is another unfortunate side effect of clean-am not being in the
>> last step of the make.  manpages.mk is autogenerated, but necessary
>> for every target, since it contains parts of the definitions.  So,
>> clean-am removes it, and then it gets re-created for execution of
>> distcheck-custom.  We need to manually call clean-generic as the very
>> last step in order to remove this file.
>>
>> distcheck will break if distclean doesn't clean everything it needs
>> to clean, so it should not be hard to catch issues if the default
>> automake targets ever change.
>>
>> Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.")
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>  tests/automake.mk | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/tests/automake.mk b/tests/automake.mk
>> index 59f538761..dbcb20dbb 100644
>> --- a/tests/automake.mk
>> +++ b/tests/automake.mk
>> @@ -361,6 +361,20 @@ check-dpdk: all
>>  	set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
>>  	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
>>  
>> +# An explicit ordering dependency to avoid race between removing the
>> +# $(DISTCLEANFILES) and running the testsite clean.  Unfortunately, it means
>> +# overriding the default distclean-am target.  Default distclean-am has all
>> +# dependencies of distclean-custom, plus clean-am.  'manpages.mk' is generated
>> +# every time any target is invoked, so since clean-am is no longer part of the
>> +# last target, we have to call the clean-generic manually to remove the
>> +# generated file.
>> +distclean-am: clean-am
>> +	@$(MAKE) distclean-custom
>> +distclean-custom: distclean-compile distclean-generic distclean-hdr \
>> +		distclean-libtool distclean-tags
>> +	@$(MAKE) clean-generic
>> +.PHONY: distclean-am distclean-custom
>> +
>>  clean-local:
>>  	test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean
>>  
>
Aaron Conole June 4, 2025, 3:08 p.m. UTC | #3
Ilya Maximets <i.maximets@ovn.org> writes:

> On 6/3/25 11:14 PM, Aaron Conole wrote:
>> Ilya Maximets <i.maximets@ovn.org> writes:
>> 
>>> The dependency graph of distclean target looks somewhat like this:
>>>
>>>   distclean
>>>   |
>>>   +-- distclean-am
>>>       |
>>>       +-- clean-am
>>>       |   |
>>>       |   +-- clean-local
>>>       |   +-- clean-generic
>>>       |
>>>       +-- distclean-generic
>>>
>>> So, in a parallel build, clean-am and distclean-generic will be
>>> executed in parallel.  distclean-generic among other things is
>>> responsible for removing DISTCLEANFILES, which are necessary for the
>>> 'testsuite --clean' ran by clean-local.  This is causing random
>>> distcheck failures in CI:
>>>
>>>  test ! -f '../../tests/testsuite'
>>>    || /bin/bash '../../tests/testsuite' -C tests --clean
>>>  ...
>>>  test -z "tests/atconfig tests/atlocal ..."
>>>    || rm -f tests/atconfig tests/atlocal ...
>>>  ...
>>>  ../tests/testsuite: line 4112: ./atconfig: No such file or directory
>>>
>>> Fix that by introducing an explicit dependency between two targets
>>> that will force the testsuite clean to be executed before the generic
>>> distribution file cleanup.
>> 
>> I guess the issue is that we are using the testsuite script to cleanup
>> the testsuite subdirs.  Could we just delete the subdirs directly in
>> clean-local instead?  For example, we make the clean-local target look
>> like:
>> 
>> ---------------------------
>> 
>> clean-testsuite-local:
>> 	-rm -rf tests/testsuite.dir
>> 
>> clean-userspace-testsuite-local:
>> 	-rm -rf tests/system-userspace-testsuite.dir
>> 
>> clean-kernel-testsuite-local:
>> 	-rm -rf tests/system-kmod-testsuite.dir
>> 
>> clean-dpdk-testsuite-local:
>> 	-rm -rf tests/system-dpdk-testsuite.dir
>> 
>> clean-afxdp-testsuite-local:
>> 	-rm -rf tests/system-afxdp-testsuite.dir
>> 
>> clean-system-offloads-testsuite-local:
>> 	-rm -rf tests/system-offloads-testsuite.dir
>> 
>> clean-system-tso-testsuite-local:
>> 	-rm -rf tests/system-tso-testsuite.dir
>> 
>> clean-local: clean-testsuite-local clean-userspace-testsuite-local \
>> 	clean-kernel-testsuite-local clean-dpdk-testsuite-local \
>> 	clean-afxdp-testsuite-local clean-system-offloads-testsuite-local \
>> 	clean-system-tso-testsuite-local
>> 
>> ---------------------------
>> 
>> This would mean that we don't run the testsuite script that depends on
>> atconfig / atlocal (which is where I guess the issue is) and preserves
>> the ability to do 'parallel' distclean (for whatever that gives).
>
> Dependencies of the regular clean and dependencies of the distclean-am are
> still executed in parallel, but now it is all deps of the clean-am and then
> all the deps of the distclean-am.  Just to clarify.
>
>> Moreover, it doesn't seem like as much of a hack (and the above is just
>> illustrative, because we can probably just add the testsuite directories
>> to the CLEAN_LOCAL var).
>
> testsuite --clean does a little more than just removal of the directory,
> it also cleans up the log file (which is not in the directory) and makes
> sure that it can actually remove the files by making sure everything inside
> has rwx permissions, which is possible that not everything does.  We can
> replicate this logic in the automake, sure, it doesn't sound hard to do.
> But both duplicating the logic of autotest and re-implementing the automake
> target are a bit hacky.  In the first case we'll need to adjust our make
> files if autotest changes, in the second - if automake changes.  Both seem
> equally unlikely.  So, I'm not sure which is better.  Re-implementing the
> testsuite --clean seems indeed a bit easier to understand as everything
> is implemented in the plain sight and not buried inside the automake.

Agreed actually - they are both hacky.  At least replicating the clean
might give us the ability to do other things in the future that could
pop up (I can't think of anything off the top of my head though).

> Note: we're not actually running the clean for any other testsuite, except
> for the base one, but we probably should.  It's not causing any  problems
> today since these are not running during distcheck.
> I'm also a little bit confused by the fact that we're building the test
> suite scripts into srcdir...  they should be in a build directory.

Agreed.  That's at least something that we could add for the clean,
although then we would need distcheck to run them; maybe it should.

> I'll try to look what it takes to re-implement the testsuite --clean, but
> it will be a larger change for sure.

Thanks Ilya!  You rock!

> Best regards, Ilya Maximets.
>
>> 
>> Hope I understood the race correctly.
>> 
>>> Some related old conversation:
>>>   https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html
>>>
>>> GNU make has a special 'target: | dep' syntax for order-only
>>> dependencies.  But, unfortunately, it doesn't really work for default
>>> automake targets, because as soon as automake sees them defined, it
>>> assumes that we're re-implementing them and doesn't generate the
>>> target content anymore.  So, if we try to add order-only dependency
>>> for distclean-generic or distclean-am, we have to fully re-implement
>>> them.  distclean-am seems like an easier candidate to re-implement.
>>>
>>> There is another unfortunate side effect of clean-am not being in the
>>> last step of the make.  manpages.mk is autogenerated, but necessary
>>> for every target, since it contains parts of the definitions.  So,
>>> clean-am removes it, and then it gets re-created for execution of
>>> distcheck-custom.  We need to manually call clean-generic as the very
>>> last step in order to remove this file.
>>>
>>> distcheck will break if distclean doesn't clean everything it needs
>>> to clean, so it should not be hard to catch issues if the default
>>> automake targets ever change.
>>>
>>> Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.")
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>>  tests/automake.mk | 14 ++++++++++++++
>>>  1 file changed, 14 insertions(+)
>>>
>>> diff --git a/tests/automake.mk b/tests/automake.mk
>>> index 59f538761..dbcb20dbb 100644
>>> --- a/tests/automake.mk
>>> +++ b/tests/automake.mk
>>> @@ -361,6 +361,20 @@ check-dpdk: all
>>>  	set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
>>>  	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
>>>  
>>> +# An explicit ordering dependency to avoid race between removing the
>>> +# $(DISTCLEANFILES) and running the testsite clean.  Unfortunately, it means
>>> +# overriding the default distclean-am target.  Default distclean-am has all
>>> +# dependencies of distclean-custom, plus clean-am.  'manpages.mk' is generated
>>> +# every time any target is invoked, so since clean-am is no longer part of the
>>> +# last target, we have to call the clean-generic manually to remove the
>>> +# generated file.
>>> +distclean-am: clean-am
>>> +	@$(MAKE) distclean-custom
>>> +distclean-custom: distclean-compile distclean-generic distclean-hdr \
>>> +		distclean-libtool distclean-tags
>>> +	@$(MAKE) clean-generic
>>> +.PHONY: distclean-am distclean-custom
>>> +
>>>  clean-local:
>>>  	test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean
>>>  
>>
diff mbox series

Patch

diff --git a/tests/automake.mk b/tests/automake.mk
index 59f538761..dbcb20dbb 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -361,6 +361,20 @@  check-dpdk: all
 	set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
 	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
 
+# An explicit ordering dependency to avoid race between removing the
+# $(DISTCLEANFILES) and running the testsite clean.  Unfortunately, it means
+# overriding the default distclean-am target.  Default distclean-am has all
+# dependencies of distclean-custom, plus clean-am.  'manpages.mk' is generated
+# every time any target is invoked, so since clean-am is no longer part of the
+# last target, we have to call the clean-generic manually to remove the
+# generated file.
+distclean-am: clean-am
+	@$(MAKE) distclean-custom
+distclean-custom: distclean-compile distclean-generic distclean-hdr \
+		distclean-libtool distclean-tags
+	@$(MAKE) clean-generic
+.PHONY: distclean-am distclean-custom
+
 clean-local:
 	test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean