diff mbox

[v2] openssl: fix race condition when symlink shared libs

Message ID 1448036610-7077-1-git-send-email-ryan.barnett@rockwellcollins.com
State Superseded
Headers show

Commit Message

Ryan Barnett Nov. 20, 2015, 4:23 p.m. UTC
The build-shared target depends on do_crypto and link-shared, which
will be executed in parallel. do_crypto calls
link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
link-shared calls symlink.linux_shared which also does SYMLINK_SO.
Before the symlink is created, it is rm'ed, but there is a tiny chance
that the second one is created after the rm has been called.

Fix this by using 'ln -sf' instead of 'ln -s' so the build doesn't
error out.

Patch submitted upstream at:
  https://bugs.gentoo.org/show_bug.cgi?id=566260

Thanks to Arnout for explain the issue (wording used above).

Signed-off-by: Ryan Barnett <ryan.barnett@rockwellcollins.com>
CC: Arnout Vandecappelle <arnout@mind.be>
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

---

Note this is a temporary fix until a gentoo maintainer weighs in and
updates their parallel build patch.

 v1 -> v2
   Remove the 'rm -f' since we are using 'ln -sf' (Arnout originally)
---
 ...ared-fix-race-condition-when-symlinking-s.patch | 48 ++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100644 package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch

Comments

Ryan Barnett Nov. 21, 2015, 12:55 a.m. UTC | #1
All,

On Fri, Nov 20, 2015 at 10:23 AM, Ryan Barnett
<ryan.barnett@rockwellcollins.com> wrote:
> The build-shared target depends on do_crypto and link-shared, which
> will be executed in parallel. do_crypto calls
> link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
> link-shared calls symlink.linux_shared which also does SYMLINK_SO.
> Before the symlink is created, it is rm'ed, but there is a tiny chance
> that the second one is created after the rm has been called.
>
> Fix this by using 'ln -sf' instead of 'ln -s' so the build doesn't
> error out.
>
> Patch submitted upstream at:
>   https://bugs.gentoo.org/show_bug.cgi?id=566260

So it doesn't appear that using 'ln -sf' is a valid fix as pointed out
in the bug report on the gentoo forum:

 -- Comment #2 from SpanKY <vapier@gentoo.org> ---
 `ln -sf` is no more atomic than `rm; ln`.  there is still a window where things
 can fail.  you can easily check this:

   i=0; while [ $((i++)) -lt 500 ]; do ln -sf a b & :; done

 a good number of those will fail with:
 ln: failed to create symbolic link ‘b’: File exists

 the deps need to be fixed up

So how do we want to proceed with this? Do we revert the patch for
parallel builds for openssl?

Thanks,
-Ryan
Arnout Vandecappelle Nov. 21, 2015, 1:24 p.m. UTC | #2
On 20-11-15 17:23, Ryan Barnett wrote:
> The build-shared target depends on do_crypto and link-shared, which
> will be executed in parallel. do_crypto calls
> link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
> link-shared calls symlink.linux_shared which also does SYMLINK_SO.
> Before the symlink is created, it is rm'ed, but there is a tiny chance
> that the second one is created after the rm has been called.
> 
> Fix this by using 'ln -sf' instead of 'ln -s' so the build doesn't
> error out.
> 
> Patch submitted upstream at:
>   https://bugs.gentoo.org/show_bug.cgi?id=566260

 Ahem, gentoo is not exactly upstream :-)  From [1]:

To report a bug or make an enhancement request, send email to rt@openssl.org. In
the subject line, please make sure to indicate if it's a bug or a fix, and a
brief description of the issue. In the body of your mail, please include the
versions of the operating system and OpenSSL you are using. If you have a patch
or diff, please send it as an attachment, and not inline in the message body.

 As far as I can see, this patch should still apply (in slightly adapted form)
to the real upstream.

> 
> Thanks to Arnout for explain the issue (wording used above).

 You're a native English speaker, right? explain -> explaining :-)


 Some more minor things below, but anyway more for the upstreams. So

Reviewed-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>


> 
> Signed-off-by: Ryan Barnett <ryan.barnett@rockwellcollins.com>
> CC: Arnout Vandecappelle <arnout@mind.be>
> CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> 
> ---
> 
> Note this is a temporary fix until a gentoo maintainer weighs in and
> updates their parallel build patch.
> 
>  v1 -> v2
>    Remove the 'rm -f' since we are using 'ln -sf' (Arnout originally)
> ---
>  ...ared-fix-race-condition-when-symlinking-s.patch | 48 ++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>  create mode 100644 package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch
> 
> diff --git a/package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch b/package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch
> new file mode 100644
> index 0000000..d6672f4
> --- /dev/null
> +++ b/package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch
> @@ -0,0 +1,48 @@
> +From 878793890a9b84ff313397c56478d7c6f770b2d2 Mon Sep 17 00:00:00 2001
> +From: Ryan Barnett <ryan.barnett@rockwellcollins.com>
> +Date: Thu, 19 Nov 2015 10:47:00 -0600
> +Subject: [PATCH] makefile.shared: fix race condition when symlinking shared
> + libs
> +
> +The build-shared target depends on do_crypto and link-shared, which
> +will be executed in parallel. do_crypto calls
> +link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
> +link-shared calls symlink.linux_shared which also does SYMLINK_SO.
> +Before the symlink is created, it is rm'ed, but there is a tiny chance
> +that the second one is created after the rm has been called.
> +
> +Fix this race condition by just using ln -sf since it will be the same
> +symlink regards and not cause the build to error out.

 regards -> regardless

> +
> +Signed-off-by: Ryan Barnett <ryan.barnett@rockwellcollins.com>
> +---
> + Makefile.shared | 8 ++++----
> + 1 file changed, 4 insertions(+), 4 deletions(-)
> +
> +diff --git a/Makefile.shared b/Makefile.shared
> +index 8d57163..8ab3d18 100644
> +--- a/Makefile.shared
> ++++ b/Makefile.shared
> +@@ -117,15 +117,15 @@ SYMLINK_SO=	\
> + 		prev=$$SHLIB$$SHLIB_SOVER$$SHLIB_SUFFIX; \
> + 		if [ -n "$$SHLIB_COMPAT" ]; then \
> + 			for x in $$SHLIB_COMPAT; do \
> +-				( $(SET_X); rm -f $$SHLIB$$x$$SHLIB_SUFFIX; \
> +-				  ln -s $$prev $$SHLIB$$x$$SHLIB_SUFFIX ); \
> ++				( $(SET_X);  \
> ++				  ln -sf $$prev $$SHLIB$$x$$SHLIB_SUFFIX ); \
> + 				prev=$$SHLIB$$x$$SHLIB_SUFFIX; \
> + 			done; \
> + 		fi; \
> + 		if [ -n "$$SHLIB_SOVER" ]; then \
> + 			[ -e "$$SHLIB$$SHLIB_SUFFIX" ] || \

 This check is not really needed anymore.


 Regards,
 Arnout


[1] https://www.openssl.org/community/


> +-			( $(SET_X); rm -f $$SHLIB$$SHLIB_SUFFIX; \
> +-			  ln -s $$prev $$SHLIB$$SHLIB_SUFFIX ); \
> ++			( $(SET_X); \
> ++			  ln -sf $$prev $$SHLIB$$SHLIB_SUFFIX ); \
> + 		fi; \
> + 	fi
> + 
> +-- 
> +1.9.1
> +
>
Ryan Barnett Nov. 21, 2015, 4:32 p.m. UTC | #3
Arnout,

On Sat, Nov 21, 2015 at 7:24 AM, Arnout Vandecappelle <arnout@mind.be> wrote:
> On 20-11-15 17:23, Ryan Barnett wrote:
>> The build-shared target depends on do_crypto and link-shared, which
>> will be executed in parallel. do_crypto calls
>> link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
>> link-shared calls symlink.linux_shared which also does SYMLINK_SO.
>> Before the symlink is created, it is rm'ed, but there is a tiny chance
>> that the second one is created after the rm has been called.
>>
>> Fix this by using 'ln -sf' instead of 'ln -s' so the build doesn't
>> error out.
>>
>> Patch submitted upstream at:
>>   https://bugs.gentoo.org/show_bug.cgi?id=566260
>
>  Ahem, gentoo is not exactly upstream :-)  From [1]:
>
> To report a bug or make an enhancement request, send email to rt@openssl.org. In
> the subject line, please make sure to indicate if it's a bug or a fix, and a
> brief description of the issue. In the body of your mail, please include the
> versions of the operating system and OpenSSL you are using. If you have a patch
> or diff, please send it as an attachment, and not inline in the message body.
>
>  As far as I can see, this patch should still apply (in slightly adapted form)
> to the real upstream.

Yes I agree to submit it upstream directly to openssl. However, I
started with Gentoo since there are the ones that seem most interested
in making OpenSSL work in parallel. I have also just sent an email to
OpenSSL about the issue.

I think this patch is pointless now since it doesn't actually fix the
problem that existed before. Please see my reply to this patch earlier
describing that 'ln -sf' isn't an atomic operation and the error from
the build failures still will exist:

http://lists.busybox.net/pipermail/buildroot/2015-November/144882.html

>>
>> Thanks to Arnout for explain the issue (wording used above).
>
>  You're a native English speaker, right? explain -> explaining :-)

Yes but I never claimed to be great at translating my English thoughts
to properly written sentence ;) This is especially true when I type up
something in a hurry which this commit message was.

Thanks,
-Ryan
Arnout Vandecappelle Nov. 21, 2015, 11:46 p.m. UTC | #4
On 21-11-15 01:55, Ryan Barnett wrote:
> All,
> 
> On Fri, Nov 20, 2015 at 10:23 AM, Ryan Barnett
> <ryan.barnett@rockwellcollins.com> wrote:
>> The build-shared target depends on do_crypto and link-shared, which
>> will be executed in parallel. do_crypto calls
>> link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
>> link-shared calls symlink.linux_shared which also does SYMLINK_SO.
>> Before the symlink is created, it is rm'ed, but there is a tiny chance
>> that the second one is created after the rm has been called.
>>
>> Fix this by using 'ln -sf' instead of 'ln -s' so the build doesn't
>> error out.
>>
>> Patch submitted upstream at:
>>   https://bugs.gentoo.org/show_bug.cgi?id=566260
> 
> So it doesn't appear that using 'ln -sf' is a valid fix as pointed out
> in the bug report on the gentoo forum:
> 
>  -- Comment #2 from SpanKY <vapier@gentoo.org> ---
>  `ln -sf` is no more atomic than `rm; ln`.  there is still a window where things
>  can fail.  you can easily check this:
> 
>    i=0; while [ $((i++)) -lt 500 ]; do ln -sf a b & :; done
> 
>  a good number of those will fail with:
>  ln: failed to create symbolic link ‘b’: File exists
> 
>  the deps need to be fixed up
> 
> So how do we want to proceed with this? Do we revert the patch for
> parallel builds for openssl?

 I think for 2015.11 at least we should indeed revert it.

 The symlink can be created atomically by going through a temp file, but that's
a bit crazy...

 Regards,
 Arnout

> 
> Thanks,
> -Ryan
>
Mike Frysinger Nov. 25, 2015, 7:31 p.m. UTC | #5
On 21 Nov 2015 14:24, Arnout Vandecappelle wrote:
> On 20-11-15 17:23, Ryan Barnett wrote:
> > The build-shared target depends on do_crypto and link-shared, which
> > will be executed in parallel. do_crypto calls
> > link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
> > link-shared calls symlink.linux_shared which also does SYMLINK_SO.
> > Before the symlink is created, it is rm'ed, but there is a tiny chance
> > that the second one is created after the rm has been called.
> > 
> > Fix this by using 'ln -sf' instead of 'ln -s' so the build doesn't
> > error out.
> > 
> > Patch submitted upstream at:
> >   https://bugs.gentoo.org/show_bug.cgi?id=566260
> 
>  Ahem, gentoo is not exactly upstream :-)  From [1]:
> 
> To report a bug or make an enhancement request, send email to rt@openssl.org.

which these have been already.  they don't respond.  in fact, you can see
a link to the relevant upstream report in every Gentoo parallel patch.
-mike
diff mbox

Patch

diff --git a/package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch b/package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch
new file mode 100644
index 0000000..d6672f4
--- /dev/null
+++ b/package/openssl/0003-makefile.shared-fix-race-condition-when-symlinking-s.patch
@@ -0,0 +1,48 @@ 
+From 878793890a9b84ff313397c56478d7c6f770b2d2 Mon Sep 17 00:00:00 2001
+From: Ryan Barnett <ryan.barnett@rockwellcollins.com>
+Date: Thu, 19 Nov 2015 10:47:00 -0600
+Subject: [PATCH] makefile.shared: fix race condition when symlinking shared
+ libs
+
+The build-shared target depends on do_crypto and link-shared, which
+will be executed in parallel. do_crypto calls
+link_a.linux_shared -> link_a.gnu which does SYMLINK_SO; in parallel,
+link-shared calls symlink.linux_shared which also does SYMLINK_SO.
+Before the symlink is created, it is rm'ed, but there is a tiny chance
+that the second one is created after the rm has been called.
+
+Fix this race condition by just using ln -sf since it will be the same
+symlink regards and not cause the build to error out.
+
+Signed-off-by: Ryan Barnett <ryan.barnett@rockwellcollins.com>
+---
+ Makefile.shared | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/Makefile.shared b/Makefile.shared
+index 8d57163..8ab3d18 100644
+--- a/Makefile.shared
++++ b/Makefile.shared
+@@ -117,15 +117,15 @@ SYMLINK_SO=	\
+ 		prev=$$SHLIB$$SHLIB_SOVER$$SHLIB_SUFFIX; \
+ 		if [ -n "$$SHLIB_COMPAT" ]; then \
+ 			for x in $$SHLIB_COMPAT; do \
+-				( $(SET_X); rm -f $$SHLIB$$x$$SHLIB_SUFFIX; \
+-				  ln -s $$prev $$SHLIB$$x$$SHLIB_SUFFIX ); \
++				( $(SET_X);  \
++				  ln -sf $$prev $$SHLIB$$x$$SHLIB_SUFFIX ); \
+ 				prev=$$SHLIB$$x$$SHLIB_SUFFIX; \
+ 			done; \
+ 		fi; \
+ 		if [ -n "$$SHLIB_SOVER" ]; then \
+ 			[ -e "$$SHLIB$$SHLIB_SUFFIX" ] || \
+-			( $(SET_X); rm -f $$SHLIB$$SHLIB_SUFFIX; \
+-			  ln -s $$prev $$SHLIB$$SHLIB_SUFFIX ); \
++			( $(SET_X); \
++			  ln -sf $$prev $$SHLIB$$SHLIB_SUFFIX ); \
+ 		fi; \
+ 	fi
+ 
+-- 
+1.9.1
+