Message ID | 20250603164247.2972039-1-i.maximets@ovn.org |
---|---|
State | Changes Requested |
Delegated to: | aaron conole |
Headers | show |
Series | [ovs-dev] automake: Fix race for atconfig during distclean. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/cirrus-robot | fail | cirrus build: failed |
Ilya Maximets <i.maximets@ovn.org> writes: > The dependency graph of distclean target looks somewhat like this: > > distclean > | > +-- distclean-am > | > +-- clean-am > | | > | +-- clean-local > | +-- clean-generic > | > +-- distclean-generic > > So, in a parallel build, clean-am and distclean-generic will be > executed in parallel. distclean-generic among other things is > responsible for removing DISTCLEANFILES, which are necessary for the > 'testsuite --clean' ran by clean-local. This is causing random > distcheck failures in CI: > > test ! -f '../../tests/testsuite' > || /bin/bash '../../tests/testsuite' -C tests --clean > ... > test -z "tests/atconfig tests/atlocal ..." > || rm -f tests/atconfig tests/atlocal ... > ... > ../tests/testsuite: line 4112: ./atconfig: No such file or directory > > Fix that by introducing an explicit dependency between two targets > that will force the testsuite clean to be executed before the generic > distribution file cleanup. I guess the issue is that we are using the testsuite script to cleanup the testsuite subdirs. Could we just delete the subdirs directly in clean-local instead? For example, we make the clean-local target look like: --------------------------- clean-testsuite-local: -rm -rf tests/testsuite.dir clean-userspace-testsuite-local: -rm -rf tests/system-userspace-testsuite.dir clean-kernel-testsuite-local: -rm -rf tests/system-kmod-testsuite.dir clean-dpdk-testsuite-local: -rm -rf tests/system-dpdk-testsuite.dir clean-afxdp-testsuite-local: -rm -rf tests/system-afxdp-testsuite.dir clean-system-offloads-testsuite-local: -rm -rf tests/system-offloads-testsuite.dir clean-system-tso-testsuite-local: -rm -rf tests/system-tso-testsuite.dir clean-local: clean-testsuite-local clean-userspace-testsuite-local \ clean-kernel-testsuite-local clean-dpdk-testsuite-local \ clean-afxdp-testsuite-local clean-system-offloads-testsuite-local \ clean-system-tso-testsuite-local --------------------------- This would mean that we don't run the testsuite script that depends on atconfig / atlocal (which is where I guess the issue is) and preserves the ability to do 'parallel' distclean (for whatever that gives). Moreover, it doesn't seem like as much of a hack (and the above is just illustrative, because we can probably just add the testsuite directories to the CLEAN_LOCAL var). Hope I understood the race correctly. > Some related old conversation: > https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html > > GNU make has a special 'target: | dep' syntax for order-only > dependencies. But, unfortunately, it doesn't really work for default > automake targets, because as soon as automake sees them defined, it > assumes that we're re-implementing them and doesn't generate the > target content anymore. So, if we try to add order-only dependency > for distclean-generic or distclean-am, we have to fully re-implement > them. distclean-am seems like an easier candidate to re-implement. > > There is another unfortunate side effect of clean-am not being in the > last step of the make. manpages.mk is autogenerated, but necessary > for every target, since it contains parts of the definitions. So, > clean-am removes it, and then it gets re-created for execution of > distcheck-custom. We need to manually call clean-generic as the very > last step in order to remove this file. > > distcheck will break if distclean doesn't clean everything it needs > to clean, so it should not be hard to catch issues if the default > automake targets ever change. > > Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.") > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > tests/automake.mk | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/tests/automake.mk b/tests/automake.mk > index 59f538761..dbcb20dbb 100644 > --- a/tests/automake.mk > +++ b/tests/automake.mk > @@ -361,6 +361,20 @@ check-dpdk: all > set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \ > "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) > > +# An explicit ordering dependency to avoid race between removing the > +# $(DISTCLEANFILES) and running the testsite clean. Unfortunately, it means > +# overriding the default distclean-am target. Default distclean-am has all > +# dependencies of distclean-custom, plus clean-am. 'manpages.mk' is generated > +# every time any target is invoked, so since clean-am is no longer part of the > +# last target, we have to call the clean-generic manually to remove the > +# generated file. > +distclean-am: clean-am > + @$(MAKE) distclean-custom > +distclean-custom: distclean-compile distclean-generic distclean-hdr \ > + distclean-libtool distclean-tags > + @$(MAKE) clean-generic > +.PHONY: distclean-am distclean-custom > + > clean-local: > test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean >
On 6/3/25 11:14 PM, Aaron Conole wrote: > Ilya Maximets <i.maximets@ovn.org> writes: > >> The dependency graph of distclean target looks somewhat like this: >> >> distclean >> | >> +-- distclean-am >> | >> +-- clean-am >> | | >> | +-- clean-local >> | +-- clean-generic >> | >> +-- distclean-generic >> >> So, in a parallel build, clean-am and distclean-generic will be >> executed in parallel. distclean-generic among other things is >> responsible for removing DISTCLEANFILES, which are necessary for the >> 'testsuite --clean' ran by clean-local. This is causing random >> distcheck failures in CI: >> >> test ! -f '../../tests/testsuite' >> || /bin/bash '../../tests/testsuite' -C tests --clean >> ... >> test -z "tests/atconfig tests/atlocal ..." >> || rm -f tests/atconfig tests/atlocal ... >> ... >> ../tests/testsuite: line 4112: ./atconfig: No such file or directory >> >> Fix that by introducing an explicit dependency between two targets >> that will force the testsuite clean to be executed before the generic >> distribution file cleanup. > > I guess the issue is that we are using the testsuite script to cleanup > the testsuite subdirs. Could we just delete the subdirs directly in > clean-local instead? For example, we make the clean-local target look > like: > > --------------------------- > > clean-testsuite-local: > -rm -rf tests/testsuite.dir > > clean-userspace-testsuite-local: > -rm -rf tests/system-userspace-testsuite.dir > > clean-kernel-testsuite-local: > -rm -rf tests/system-kmod-testsuite.dir > > clean-dpdk-testsuite-local: > -rm -rf tests/system-dpdk-testsuite.dir > > clean-afxdp-testsuite-local: > -rm -rf tests/system-afxdp-testsuite.dir > > clean-system-offloads-testsuite-local: > -rm -rf tests/system-offloads-testsuite.dir > > clean-system-tso-testsuite-local: > -rm -rf tests/system-tso-testsuite.dir > > clean-local: clean-testsuite-local clean-userspace-testsuite-local \ > clean-kernel-testsuite-local clean-dpdk-testsuite-local \ > clean-afxdp-testsuite-local clean-system-offloads-testsuite-local \ > clean-system-tso-testsuite-local > > --------------------------- > > This would mean that we don't run the testsuite script that depends on > atconfig / atlocal (which is where I guess the issue is) and preserves > the ability to do 'parallel' distclean (for whatever that gives). Dependencies of the regular clean and dependencies of the distclean-am are still executed in parallel, but now it is all deps of the clean-am and then all the deps of the distclean-am. Just to clarify. > Moreover, it doesn't seem like as much of a hack (and the above is just > illustrative, because we can probably just add the testsuite directories > to the CLEAN_LOCAL var). testsuite --clean does a little more than just removal of the directory, it also cleans up the log file (which is not in the directory) and makes sure that it can actually remove the files by making sure everything inside has rwx permissions, which is possible that not everything does. We can replicate this logic in the automake, sure, it doesn't sound hard to do. But both duplicating the logic of autotest and re-implementing the automake target are a bit hacky. In the first case we'll need to adjust our make files if autotest changes, in the second - if automake changes. Both seem equally unlikely. So, I'm not sure which is better. Re-implementing the testsuite --clean seems indeed a bit easier to understand as everything is implemented in the plain sight and not buried inside the automake. Note: we're not actually running the clean for any other testsuite, except for the base one, but we probably should. It's not causing any problems today since these are not running during distcheck. I'm also a little bit confused by the fact that we're building the test suite scripts into srcdir... they should be in a build directory. I'll try to look what it takes to re-implement the testsuite --clean, but it will be a larger change for sure. Best regards, Ilya Maximets. > > Hope I understood the race correctly. > >> Some related old conversation: >> https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html >> >> GNU make has a special 'target: | dep' syntax for order-only >> dependencies. But, unfortunately, it doesn't really work for default >> automake targets, because as soon as automake sees them defined, it >> assumes that we're re-implementing them and doesn't generate the >> target content anymore. So, if we try to add order-only dependency >> for distclean-generic or distclean-am, we have to fully re-implement >> them. distclean-am seems like an easier candidate to re-implement. >> >> There is another unfortunate side effect of clean-am not being in the >> last step of the make. manpages.mk is autogenerated, but necessary >> for every target, since it contains parts of the definitions. So, >> clean-am removes it, and then it gets re-created for execution of >> distcheck-custom. We need to manually call clean-generic as the very >> last step in order to remove this file. >> >> distcheck will break if distclean doesn't clean everything it needs >> to clean, so it should not be hard to catch issues if the default >> automake targets ever change. >> >> Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.") >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- >> tests/automake.mk | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/tests/automake.mk b/tests/automake.mk >> index 59f538761..dbcb20dbb 100644 >> --- a/tests/automake.mk >> +++ b/tests/automake.mk >> @@ -361,6 +361,20 @@ check-dpdk: all >> set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \ >> "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) >> >> +# An explicit ordering dependency to avoid race between removing the >> +# $(DISTCLEANFILES) and running the testsite clean. Unfortunately, it means >> +# overriding the default distclean-am target. Default distclean-am has all >> +# dependencies of distclean-custom, plus clean-am. 'manpages.mk' is generated >> +# every time any target is invoked, so since clean-am is no longer part of the >> +# last target, we have to call the clean-generic manually to remove the >> +# generated file. >> +distclean-am: clean-am >> + @$(MAKE) distclean-custom >> +distclean-custom: distclean-compile distclean-generic distclean-hdr \ >> + distclean-libtool distclean-tags >> + @$(MAKE) clean-generic >> +.PHONY: distclean-am distclean-custom >> + >> clean-local: >> test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean >> >
Ilya Maximets <i.maximets@ovn.org> writes: > On 6/3/25 11:14 PM, Aaron Conole wrote: >> Ilya Maximets <i.maximets@ovn.org> writes: >> >>> The dependency graph of distclean target looks somewhat like this: >>> >>> distclean >>> | >>> +-- distclean-am >>> | >>> +-- clean-am >>> | | >>> | +-- clean-local >>> | +-- clean-generic >>> | >>> +-- distclean-generic >>> >>> So, in a parallel build, clean-am and distclean-generic will be >>> executed in parallel. distclean-generic among other things is >>> responsible for removing DISTCLEANFILES, which are necessary for the >>> 'testsuite --clean' ran by clean-local. This is causing random >>> distcheck failures in CI: >>> >>> test ! -f '../../tests/testsuite' >>> || /bin/bash '../../tests/testsuite' -C tests --clean >>> ... >>> test -z "tests/atconfig tests/atlocal ..." >>> || rm -f tests/atconfig tests/atlocal ... >>> ... >>> ../tests/testsuite: line 4112: ./atconfig: No such file or directory >>> >>> Fix that by introducing an explicit dependency between two targets >>> that will force the testsuite clean to be executed before the generic >>> distribution file cleanup. >> >> I guess the issue is that we are using the testsuite script to cleanup >> the testsuite subdirs. Could we just delete the subdirs directly in >> clean-local instead? For example, we make the clean-local target look >> like: >> >> --------------------------- >> >> clean-testsuite-local: >> -rm -rf tests/testsuite.dir >> >> clean-userspace-testsuite-local: >> -rm -rf tests/system-userspace-testsuite.dir >> >> clean-kernel-testsuite-local: >> -rm -rf tests/system-kmod-testsuite.dir >> >> clean-dpdk-testsuite-local: >> -rm -rf tests/system-dpdk-testsuite.dir >> >> clean-afxdp-testsuite-local: >> -rm -rf tests/system-afxdp-testsuite.dir >> >> clean-system-offloads-testsuite-local: >> -rm -rf tests/system-offloads-testsuite.dir >> >> clean-system-tso-testsuite-local: >> -rm -rf tests/system-tso-testsuite.dir >> >> clean-local: clean-testsuite-local clean-userspace-testsuite-local \ >> clean-kernel-testsuite-local clean-dpdk-testsuite-local \ >> clean-afxdp-testsuite-local clean-system-offloads-testsuite-local \ >> clean-system-tso-testsuite-local >> >> --------------------------- >> >> This would mean that we don't run the testsuite script that depends on >> atconfig / atlocal (which is where I guess the issue is) and preserves >> the ability to do 'parallel' distclean (for whatever that gives). > > Dependencies of the regular clean and dependencies of the distclean-am are > still executed in parallel, but now it is all deps of the clean-am and then > all the deps of the distclean-am. Just to clarify. > >> Moreover, it doesn't seem like as much of a hack (and the above is just >> illustrative, because we can probably just add the testsuite directories >> to the CLEAN_LOCAL var). > > testsuite --clean does a little more than just removal of the directory, > it also cleans up the log file (which is not in the directory) and makes > sure that it can actually remove the files by making sure everything inside > has rwx permissions, which is possible that not everything does. We can > replicate this logic in the automake, sure, it doesn't sound hard to do. > But both duplicating the logic of autotest and re-implementing the automake > target are a bit hacky. In the first case we'll need to adjust our make > files if autotest changes, in the second - if automake changes. Both seem > equally unlikely. So, I'm not sure which is better. Re-implementing the > testsuite --clean seems indeed a bit easier to understand as everything > is implemented in the plain sight and not buried inside the automake. Agreed actually - they are both hacky. At least replicating the clean might give us the ability to do other things in the future that could pop up (I can't think of anything off the top of my head though). > Note: we're not actually running the clean for any other testsuite, except > for the base one, but we probably should. It's not causing any problems > today since these are not running during distcheck. > I'm also a little bit confused by the fact that we're building the test > suite scripts into srcdir... they should be in a build directory. Agreed. That's at least something that we could add for the clean, although then we would need distcheck to run them; maybe it should. > I'll try to look what it takes to re-implement the testsuite --clean, but > it will be a larger change for sure. Thanks Ilya! You rock! > Best regards, Ilya Maximets. > >> >> Hope I understood the race correctly. >> >>> Some related old conversation: >>> https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html >>> >>> GNU make has a special 'target: | dep' syntax for order-only >>> dependencies. But, unfortunately, it doesn't really work for default >>> automake targets, because as soon as automake sees them defined, it >>> assumes that we're re-implementing them and doesn't generate the >>> target content anymore. So, if we try to add order-only dependency >>> for distclean-generic or distclean-am, we have to fully re-implement >>> them. distclean-am seems like an easier candidate to re-implement. >>> >>> There is another unfortunate side effect of clean-am not being in the >>> last step of the make. manpages.mk is autogenerated, but necessary >>> for every target, since it contains parts of the definitions. So, >>> clean-am removes it, and then it gets re-created for execution of >>> distcheck-custom. We need to manually call clean-generic as the very >>> last step in order to remove this file. >>> >>> distcheck will break if distclean doesn't clean everything it needs >>> to clean, so it should not be hard to catch issues if the default >>> automake targets ever change. >>> >>> Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.") >>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >>> --- >>> tests/automake.mk | 14 ++++++++++++++ >>> 1 file changed, 14 insertions(+) >>> >>> diff --git a/tests/automake.mk b/tests/automake.mk >>> index 59f538761..dbcb20dbb 100644 >>> --- a/tests/automake.mk >>> +++ b/tests/automake.mk >>> @@ -361,6 +361,20 @@ check-dpdk: all >>> set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \ >>> "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) >>> >>> +# An explicit ordering dependency to avoid race between removing the >>> +# $(DISTCLEANFILES) and running the testsite clean. Unfortunately, it means >>> +# overriding the default distclean-am target. Default distclean-am has all >>> +# dependencies of distclean-custom, plus clean-am. 'manpages.mk' is generated >>> +# every time any target is invoked, so since clean-am is no longer part of the >>> +# last target, we have to call the clean-generic manually to remove the >>> +# generated file. >>> +distclean-am: clean-am >>> + @$(MAKE) distclean-custom >>> +distclean-custom: distclean-compile distclean-generic distclean-hdr \ >>> + distclean-libtool distclean-tags >>> + @$(MAKE) clean-generic >>> +.PHONY: distclean-am distclean-custom >>> + >>> clean-local: >>> test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean >>> >>
diff --git a/tests/automake.mk b/tests/automake.mk index 59f538761..dbcb20dbb 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -361,6 +361,20 @@ check-dpdk: all set $(SHELL) '$(SYSTEM_DPDK_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \ "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) +# An explicit ordering dependency to avoid race between removing the +# $(DISTCLEANFILES) and running the testsite clean. Unfortunately, it means +# overriding the default distclean-am target. Default distclean-am has all +# dependencies of distclean-custom, plus clean-am. 'manpages.mk' is generated +# every time any target is invoked, so since clean-am is no longer part of the +# last target, we have to call the clean-generic manually to remove the +# generated file. +distclean-am: clean-am + @$(MAKE) distclean-custom +distclean-custom: distclean-compile distclean-generic distclean-hdr \ + distclean-libtool distclean-tags + @$(MAKE) clean-generic +.PHONY: distclean-am distclean-custom + clean-local: test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean
The dependency graph of distclean target looks somewhat like this: distclean | +-- distclean-am | +-- clean-am | | | +-- clean-local | +-- clean-generic | +-- distclean-generic So, in a parallel build, clean-am and distclean-generic will be executed in parallel. distclean-generic among other things is responsible for removing DISTCLEANFILES, which are necessary for the 'testsuite --clean' ran by clean-local. This is causing random distcheck failures in CI: test ! -f '../../tests/testsuite' || /bin/bash '../../tests/testsuite' -C tests --clean ... test -z "tests/atconfig tests/atlocal ..." || rm -f tests/atconfig tests/atlocal ... ... ../tests/testsuite: line 4112: ./atconfig: No such file or directory Fix that by introducing an explicit dependency between two targets that will force the testsuite clean to be executed before the generic distribution file cleanup. Some related old conversation: https://lists.gnu.org/archive/html/autoconf/2008-04/msg00012.html GNU make has a special 'target: | dep' syntax for order-only dependencies. But, unfortunately, it doesn't really work for default automake targets, because as soon as automake sees them defined, it assumes that we're re-implementing them and doesn't generate the target content anymore. So, if we try to add order-only dependency for distclean-generic or distclean-am, we have to fully re-implement them. distclean-am seems like an easier candidate to re-implement. There is another unfortunate side effect of clean-am not being in the last step of the make. manpages.mk is autogenerated, but necessary for every target, since it contains parts of the definitions. So, clean-am removes it, and then it gets re-created for execution of distcheck-custom. We need to manually call clean-generic as the very last step in order to remove this file. distcheck will break if distclean doesn't clean everything it needs to clean, so it should not be hard to catch issues if the default automake targets ever change. Fixes: 1b233b95b590 ("Use Autotest for the Open vSwitch test suite.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- tests/automake.mk | 14 ++++++++++++++ 1 file changed, 14 insertions(+)