diff mbox

[2/5] support/download: properly use temp files

Message ID 3c2e529b6f9c542dba662b08b54f844f275e23ee.1404681878.git.yann.morin.1998@free.fr
State Superseded
Headers show

Commit Message

Yann E. MORIN July 6, 2014, 9:27 p.m. UTC
When using 'mv' to move files to temp files, the existing temp file is
forcibly removed by 'mv', and a new file is created.

Although this should have little impact to us, there is still a race
conditions in case two parallel downloads would step on each other's
toes, and it goes like that:

    Process 1                       Process 2
    mktemp tmp_output
    mv tmp_dl tmp_output
        removes tmp_output
                                    mktemp tmp_ouput
        writes to tmp_output
                                    mv tmp_dl tmp_output
                                        removes tmp_output
                                        writes to tmp_output
    mv tmp_output output
                                    mv tmp_output output

In this case, mktemp has the opportunity to create the same tmp_output
temporary file, and we trash the content from one with the content
from the other, and the last to do the final 'mv' breaks horibly.

Instead, use 'cat', which guarantees that tmp_output is not removed
before writing to it.

This complexifies a bit the local-files (cp) helper, but better safe
than sorry.

(Note: this is a purely theoretical issue, I did not observe it yet, but
it is slated to happen one day or the other, and will be very hard to
debug then.)

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
 support/download/bzr  | 2 +-
 support/download/cp   | 9 ++++++---
 support/download/cvs  | 2 +-
 support/download/git  | 4 ++--
 support/download/hg   | 2 +-
 support/download/scp  | 2 +-
 support/download/svn  | 2 +-
 support/download/wget | 2 +-
 8 files changed, 14 insertions(+), 11 deletions(-)

Comments

Arnout Vandecappelle July 7, 2014, 6:11 a.m. UTC | #1
On 06/07/14 23:27, Yann E. MORIN wrote:
> When using 'mv' to move files to temp files, the existing temp file is
> forcibly removed by 'mv', and a new file is created.
> 
> Although this should have little impact to us, there is still a race
> conditions in case two parallel downloads would step on each other's
> toes, and it goes like that:
> 
>     Process 1                       Process 2
>     mktemp tmp_output
>     mv tmp_dl tmp_output
>         removes tmp_output
>                                     mktemp tmp_ouput
>         writes to tmp_output
>                                     mv tmp_dl tmp_output
>                                         removes tmp_output
>                                         writes to tmp_output
>     mv tmp_output output
>                                     mv tmp_output output
> 
> In this case, mktemp has the opportunity to create the same tmp_output
> temporary file, and we trash the content from one with the content
> from the other, and the last to do the final 'mv' breaks horibly.
> 
> Instead, use 'cat', which guarantees that tmp_output is not removed
> before writing to it.

 Not that it makes a real difference, but I think that 'cp' is a more natural
way to do this.

> 
> This complexifies a bit the local-files (cp) helper, but better safe
> than sorry.
> 
> (Note: this is a purely theoretical issue, I did not observe it yet, but
> it is slated to happen one day or the other, and will be very hard to
> debug then.)

 Actually, since the mktemp string is based on time and PID, the chance of this
happening is really vanishingly small. That said, better safe than sorry.


> 
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> ---
>  support/download/bzr  | 2 +-
>  support/download/cp   | 9 ++++++---
>  support/download/cvs  | 2 +-
>  support/download/git  | 4 ++--
>  support/download/hg   | 2 +-
>  support/download/scp  | 2 +-
>  support/download/svn  | 2 +-
>  support/download/wget | 2 +-
>  8 files changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/support/download/bzr b/support/download/bzr
> index 19d837d..f86fa82 100755
> --- a/support/download/bzr
> +++ b/support/download/bzr
> @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
>  
>  ret=1
>  if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then
> -    if mv "${tmp_dl}" "${tmp_output}"; then
> +    if cat "${tmp_dl}" >"${tmp_output}"; then
>          mv "${tmp_output}" "${output}"
>          ret=0
>      fi

 Not really related to this patch, but why do we need this ${tmp_dl} to begin
with? Especially since we're already "occupying" a tempfile in DL_DIR anyway.

 Also, with this added complexity, the download helper scripts are becoming
quite similar. So wouldn't it be a good idea to refactor them? I'm thinking of
something like this:

support/download/helper:

#!/bin/bash

# We want to catch any command failure, and exit immediately
set -e

# Generic download helper
# Call it with:
#   $1: specific download helper
#   $2: output file
#   ...: arguments for the specific download helper

download="${1}"
output="${2}"
shift 2

tmp_output="$( mktemp ... )"
if "${download}" "${tmp_output}" "$@"; then
	# Blah blah need things to be atomic blah blah
	mv "${tmp_output}" "${output}"
fi



 I expect there will be even more common stuff between the download helpers in
the future, so it makes sense to have this.


> diff --git a/support/download/cp b/support/download/cp
> index 8f6bc06..e73159b 100755
> --- a/support/download/cp
> +++ b/support/download/cp
> @@ -13,12 +13,15 @@ set -e
>  source="${1}"
>  output="${2}"
>  
> +tmp_dl="$( mktemp "${BUILD_DIR}/.XXXXXX" )"
>  tmp_output="$( mktemp "${output}.XXXXXX" )"
>  
>  ret=1
> -if ${LOCALFILES} "${source}" "${tmp_output}"; then
> -    mv "${tmp_output}" "${output}"
> -    ret=0
> +if ${LOCALFILES} "${source}" "${tmp_dl}"; then
> +    if cat "${tmp_dl}" >"${tmp_output}"; then

 Same here, what's the point of ${tmp_dl}?

> +        mv "${tmp_output}" "${output}"
> +        ret=0
> +    fi
>  fi
>  
>  # Cleanup
> diff --git a/support/download/cvs b/support/download/cvs
> index 9aeed79..a8ab080 100755
> --- a/support/download/cvs
> +++ b/support/download/cvs
> @@ -36,7 +36,7 @@ rm -rf "${repodir}"
>  ret=1
>  if ${CVS} -z3 -d":pserver:anonymous@${repo}" \
>             co -d "${repodir}" -r ":${rev}" -P "${rawname}"; then
> -    if tar czf "${tmp_output}" "${repodir}"; then
> +    if tar czf - "${repodir}" >"${tmp_output}"; then
>          mv "${tmp_output}" "${output}"
>          ret=0
>      fi
> diff --git a/support/download/git b/support/download/git
> index badb491..b0031e5 100755
> --- a/support/download/git
> +++ b/support/download/git
> @@ -43,8 +43,8 @@ fi
>  
>  ret=1
>  pushd "${repodir}" >/dev/null
> -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \
> -                  --format=tar "${cset}"; then
> +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \
> +                  >"${tmp_tar}"; then

 What's the reason for this change? I've checked with strace, and git archive
doesn't seem to unlink, it just does open(O_TRUNC) like cp.


>      if gzip -c "${tmp_tar}" >"${tmp_output}"; then
>          mv "${tmp_output}" "${output}"
>          ret=0
> diff --git a/support/download/hg b/support/download/hg
> index 6e9e26b..8b36db9 100755
> --- a/support/download/hg
> +++ b/support/download/hg
> @@ -35,7 +35,7 @@ ret=1
>  if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then
>      if ${HG} archive --repository "${repodir}" --type tgz \
>                       --prefix "${basename}" --rev "${cset}" \
> -                     "${tmp_output}"; then
> +                     - >"${tmp_output}"; then

 I didn't check for hg, but I also don't expect it to unlink() first. In fact,
it's mv's behaviour of unlinking first that is surprising.

>          mv "${tmp_output}" "${output}"
>          ret=0
>      fi
> diff --git a/support/download/scp b/support/download/scp
> index d3aad43..e16ece5 100755
> --- a/support/download/scp
> +++ b/support/download/scp
> @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
>  
>  ret=1
>  if ${SCP} "${url}" "${tmp_dl}"; then
> -    if mv "${tmp_dl}" "${tmp_output}"; then
> +    if cat "${tmp_dl}" >"${tmp_output}"; then
>          mv "${tmp_output}" "${output}"
>          ret=0
>      fi
> diff --git a/support/download/svn b/support/download/svn
> index 39cbbcb..259d3ed 100755
> --- a/support/download/svn
> +++ b/support/download/svn
> @@ -33,7 +33,7 @@ rm -rf "${repodir}"
>  
>  ret=1
>  if ${SVN} export "${repo}@${rev}" "${repodir}"; then
> -    if tar czf "${tmp_output}" "${repodir}"; then
> +    if tar czf - "${repodir}" >"${tmp_output}"; then

 Again, tar doesn't unlink so this isn't needed.



 Regards,
 Arnout

>          mv "${tmp_output}" "${output}"
>          ret=0
>      fi
> diff --git a/support/download/wget b/support/download/wget
> index cbeca3b..0f71108 100755
> --- a/support/download/wget
> +++ b/support/download/wget
> @@ -25,7 +25,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
>  
>  ret=1
>  if ${WGET} -O "${tmp_dl}" "${url}"; then
> -    if mv "${tmp_dl}" "${tmp_output}"; then
> +    if cat "${tmp_dl}" >"${tmp_output}"; then
>          mv "${tmp_output}" "${output}"
>          ret=0
>      fi
>
Yann E. MORIN July 7, 2014, 9:38 p.m. UTC | #2
Arnout, All,

On 2014-07-07 08:11 +0200, Arnout Vandecappelle spake thusly:
> On 06/07/14 23:27, Yann E. MORIN wrote:
> > When using 'mv' to move files to temp files, the existing temp file is
> > forcibly removed by 'mv', and a new file is created.
> > 
> > Although this should have little impact to us, there is still a race
> > conditions in case two parallel downloads would step on each other's
> > toes, and it goes like that:
> > 
> >     Process 1                       Process 2
> >     mktemp tmp_output
> >     mv tmp_dl tmp_output
> >         removes tmp_output
> >                                     mktemp tmp_ouput
> >         writes to tmp_output
> >                                     mv tmp_dl tmp_output
> >                                         removes tmp_output
> >                                         writes to tmp_output
> >     mv tmp_output output
> >                                     mv tmp_output output
> > 
> > In this case, mktemp has the opportunity to create the same tmp_output
> > temporary file, and we trash the content from one with the content
> > from the other, and the last to do the final 'mv' breaks horibly.
> > 
> > Instead, use 'cat', which guarantees that tmp_output is not removed
> > before writing to it.
> 
>  Not that it makes a real difference, but I think that 'cp' is a more natural
> way to do this.

I am not sure how cp handles copying over an existing file. I'll
check...

> > This complexifies a bit the local-files (cp) helper, but better safe
> > than sorry.
> > 
> > (Note: this is a purely theoretical issue, I did not observe it yet, but
> > it is slated to happen one day or the other, and will be very hard to
> > debug then.)
> 
>  Actually, since the mktemp string is based on time and PID, the chance of this
> happening is really vanishingly small. That said, better safe than sorry.

Yes, the opportunity for a collision is very low, but it's not
impossible. After all, that's the problem with race conditions: given
sufficient time and trials, you'll hit it, and it is very hard to debug,
especially in this case, since the condition is not reproducible (random
names.)

> > Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> > ---
> >  support/download/bzr  | 2 +-
> >  support/download/cp   | 9 ++++++---
> >  support/download/cvs  | 2 +-
> >  support/download/git  | 4 ++--
> >  support/download/hg   | 2 +-
> >  support/download/scp  | 2 +-
> >  support/download/svn  | 2 +-
> >  support/download/wget | 2 +-
> >  8 files changed, 14 insertions(+), 11 deletions(-)
> > 
> > diff --git a/support/download/bzr b/support/download/bzr
> > index 19d837d..f86fa82 100755
> > --- a/support/download/bzr
> > +++ b/support/download/bzr
> > @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
> >  
> >  ret=1
> >  if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then
> > -    if mv "${tmp_dl}" "${tmp_output}"; then
> > +    if cat "${tmp_dl}" >"${tmp_output}"; then
> >          mv "${tmp_output}" "${output}"
> >          ret=0
> >      fi
> 
>  Not really related to this patch, but why do we need this ${tmp_dl} to begin
> with? Especially since we're already "occupying" a tempfile in DL_DIR anyway.

The idea was not to pollute BR2_DL_DIR, in case the download fails.
Hence this dance:
  - download to a disposable area (BUILD_DIR);
  - move to a temp file in BR2_DL_DIR;
  - atomically rename to the final file.

But granted, since  we remove failed downloads, we can skip the
intermediary temp file in BUILD_DIR (although I still do not like much
the idea of polluting BR2_DL_DIR.)

I think there is still a case for which we can't remove the temp file...
Ah yes, if the user cancels a build with Ctrl-C, the script is aborted,
so we leave a .XXXXXX file in BR2_DL_DIR.

But even with the three-step dance above, there is still a small window
for leaving out cruft in BR2_DL_DIR; but that window is much, much
smaller than the download window.

>  Also, with this added complexity, the download helper scripts are becoming
> quite similar. So wouldn't it be a good idea to refactor them?

I think we already had this discussion back when I introduced the series:
    http://lists.busybox.net/pipermail/buildroot/2014-January/086826.html

But for now, there are a few important fixes I'd like be applied before
going further.

> I'm thinking of
> something like this:
> 
> support/download/helper:
> 
> #!/bin/bash
> 
> # We want to catch any command failure, and exit immediately
> set -e
> 
> # Generic download helper
> # Call it with:
> #   $1: specific download helper
> #   $2: output file
> #   ...: arguments for the specific download helper
> 
> download="${1}"
> output="${2}"
> shift 2
> 
> tmp_output="$( mktemp ... )"
> if "${download}" "${tmp_output}" "$@"; then
> 	# Blah blah need things to be atomic blah blah
> 	mv "${tmp_output}" "${output}"
> fi

Oh, you meant having a wrapper script above all the other helpers, and
delegate to the helpers only the download, and handle the atomicity
dance to the wrapper. Hmmm... Might be worth looking at, probably, yes.

>  I expect there will be even more common stuff between the download helpers in
> the future, so it makes sense to have this.

Well, for one, moving the hash check in that wrapper, for example...

> > diff --git a/support/download/cp b/support/download/cp
> > index 8f6bc06..e73159b 100755
> > --- a/support/download/cp
> > +++ b/support/download/cp
> > @@ -13,12 +13,15 @@ set -e
> >  source="${1}"
> >  output="${2}"
> >  
> > +tmp_dl="$( mktemp "${BUILD_DIR}/.XXXXXX" )"
> >  tmp_output="$( mktemp "${output}.XXXXXX" )"
> >  
> >  ret=1
> > -if ${LOCALFILES} "${source}" "${tmp_output}"; then
> > -    mv "${tmp_output}" "${output}"
> > -    ret=0
> > +if ${LOCALFILES} "${source}" "${tmp_dl}"; then
> > +    if cat "${tmp_dl}" >"${tmp_output}"; then
> 
>  Same here, what's the point of ${tmp_dl}?

Same answer.

> > +        mv "${tmp_output}" "${output}"
> > +        ret=0
> > +    fi
> >  fi
> >  
> >  # Cleanup
> > diff --git a/support/download/cvs b/support/download/cvs
> > index 9aeed79..a8ab080 100755
> > --- a/support/download/cvs
> > +++ b/support/download/cvs
> > @@ -36,7 +36,7 @@ rm -rf "${repodir}"
> >  ret=1
> >  if ${CVS} -z3 -d":pserver:anonymous@${repo}" \
> >             co -d "${repodir}" -r ":${rev}" -P "${rawname}"; then
> > -    if tar czf "${tmp_output}" "${repodir}"; then
> > +    if tar czf - "${repodir}" >"${tmp_output}"; then
> >          mv "${tmp_output}" "${output}"
> >          ret=0
> >      fi
> > diff --git a/support/download/git b/support/download/git
> > index badb491..b0031e5 100755
> > --- a/support/download/git
> > +++ b/support/download/git
> > @@ -43,8 +43,8 @@ fi
> >  
> >  ret=1
> >  pushd "${repodir}" >/dev/null
> > -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \
> > -                  --format=tar "${cset}"; then
> > +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \
> > +                  >"${tmp_tar}"; then
> 
>  What's the reason for this change? I've checked with strace, and git archive
> doesn't seem to unlink, it just does open(O_TRUNC) like cp.

Ah, so you did strace both? I straced scp, and it does unlink.
Are we sure all cp and all tar we can encounter will all use
open(O_TRUNC) ?

It'd rather go on the safe side, and not assume much about this,
especially since that behaviour may change in the future, or may not
show in older versions, or different versions.

> >      if gzip -c "${tmp_tar}" >"${tmp_output}"; then
> >          mv "${tmp_output}" "${output}"
> >          ret=0
> > diff --git a/support/download/hg b/support/download/hg
> > index 6e9e26b..8b36db9 100755
> > --- a/support/download/hg
> > +++ b/support/download/hg
> > @@ -35,7 +35,7 @@ ret=1
> >  if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then
> >      if ${HG} archive --repository "${repodir}" --type tgz \
> >                       --prefix "${basename}" --rev "${cset}" \
> > -                     "${tmp_output}"; then
> > +                     - >"${tmp_output}"; then
> 
>  I didn't check for hg, but I also don't expect it to unlink() first. In fact,
> it's mv's behaviour of unlinking first that is surprising.

Well, scp also unlinks first.

And same answer, I don;t want to rely on such a behaviour. Redirecting
works the same everywhere, though.

> >          mv "${tmp_output}" "${output}"
> >          ret=0
> >      fi
> > diff --git a/support/download/scp b/support/download/scp
> > index d3aad43..e16ece5 100755
> > --- a/support/download/scp
> > +++ b/support/download/scp
> > @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
> >  
> >  ret=1
> >  if ${SCP} "${url}" "${tmp_dl}"; then
> > -    if mv "${tmp_dl}" "${tmp_output}"; then
> > +    if cat "${tmp_dl}" >"${tmp_output}"; then
> >          mv "${tmp_output}" "${output}"
> >          ret=0
> >      fi
> > diff --git a/support/download/svn b/support/download/svn
> > index 39cbbcb..259d3ed 100755
> > --- a/support/download/svn
> > +++ b/support/download/svn
> > @@ -33,7 +33,7 @@ rm -rf "${repodir}"
> >  
> >  ret=1
> >  if ${SVN} export "${repo}@${rev}" "${repodir}"; then
> > -    if tar czf "${tmp_output}" "${repodir}"; then
> > +    if tar czf - "${repodir}" >"${tmp_output}"; then
> 
>  Again, tar doesn't unlink so this isn't needed.

Ditto.

Regards,
Yann E. MORIN.
Arnout Vandecappelle July 8, 2014, 4:42 p.m. UTC | #3
On 07/07/14 23:38, Yann E. MORIN wrote:
> Arnout, All,
> 
> On 2014-07-07 08:11 +0200, Arnout Vandecappelle spake thusly:
>> On 06/07/14 23:27, Yann E. MORIN wrote:
>>> When using 'mv' to move files to temp files, the existing temp file is
>>> forcibly removed by 'mv', and a new file is created.
>>>
>>> Although this should have little impact to us, there is still a race
>>> conditions in case two parallel downloads would step on each other's
>>> toes, and it goes like that:
>>>
>>>     Process 1                       Process 2
>>>     mktemp tmp_output
>>>     mv tmp_dl tmp_output
>>>         removes tmp_output
>>>                                     mktemp tmp_ouput
>>>         writes to tmp_output
>>>                                     mv tmp_dl tmp_output
>>>                                         removes tmp_output
>>>                                         writes to tmp_output
>>>     mv tmp_output output
>>>                                     mv tmp_output output
>>>
>>> In this case, mktemp has the opportunity to create the same tmp_output
>>> temporary file, and we trash the content from one with the content
>>> from the other, and the last to do the final 'mv' breaks horibly.
>>>
>>> Instead, use 'cat', which guarantees that tmp_output is not removed
>>> before writing to it.
>>
>>  Not that it makes a real difference, but I think that 'cp' is a more natural
>> way to do this.
> 
> I am not sure how cp handles copying over an existing file. I'll
> check...

 It does. It only unlinks if open(O_TRUNC) fails. (Checked with strace for
coreutils and my reading the source for busybox.)


> 
>>> This complexifies a bit the local-files (cp) helper, but better safe
>>> than sorry.
>>>
>>> (Note: this is a purely theoretical issue, I did not observe it yet, but
>>> it is slated to happen one day or the other, and will be very hard to
>>> debug then.)
>>
>>  Actually, since the mktemp string is based on time and PID, the chance of this
>> happening is really vanishingly small. That said, better safe than sorry.
> 
> Yes, the opportunity for a collision is very low, but it's not
> impossible. After all, that's the problem with race conditions: given
> sufficient time and trials, you'll hit it, and it is very hard to debug,
> especially in this case, since the condition is not reproducible (random
> names.)

 Well, since mktemp uses /dev/urandom to generate the file pattern (I was wrong
about time/pid, that's only if urandom is not available), the 6-character random
number has a collision frequency of about 10^10. The chance that such a
collision occurs at exactly the moment that we're doing two parallel downloads
of the same source is pretty darn small - much smaller than the chance of a sha1
collision I think.

> 
>>> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
>>> ---
>>>  support/download/bzr  | 2 +-
>>>  support/download/cp   | 9 ++++++---
>>>  support/download/cvs  | 2 +-
>>>  support/download/git  | 4 ++--
>>>  support/download/hg   | 2 +-
>>>  support/download/scp  | 2 +-
>>>  support/download/svn  | 2 +-
>>>  support/download/wget | 2 +-
>>>  8 files changed, 14 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/support/download/bzr b/support/download/bzr
>>> index 19d837d..f86fa82 100755
>>> --- a/support/download/bzr
>>> +++ b/support/download/bzr
>>> @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
>>>  
>>>  ret=1
>>>  if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then
>>> -    if mv "${tmp_dl}" "${tmp_output}"; then
>>> +    if cat "${tmp_dl}" >"${tmp_output}"; then
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>>      fi
>>
>>  Not really related to this patch, but why do we need this ${tmp_dl} to begin
>> with? Especially since we're already "occupying" a tempfile in DL_DIR anyway.
> 
> The idea was not to pollute BR2_DL_DIR, in case the download fails.
> Hence this dance:
>   - download to a disposable area (BUILD_DIR);
>   - move to a temp file in BR2_DL_DIR;
>   - atomically rename to the final file.
> 
> But granted, since  we remove failed downloads, we can skip the
> intermediary temp file in BUILD_DIR (although I still do not like much
> the idea of polluting BR2_DL_DIR.)

 Well, now you pollute BR2_DL_DIR with ${tmp_output} instead of ${tmp_dl}. Oh,
now I get it: if a download is interrupted, you'll leave the partial download in
BUILD_DIR instead of DL_DIR. OK, sorry about the noise.

> 
> I think there is still a case for which we can't remove the temp file...
> Ah yes, if the user cancels a build with Ctrl-C, the script is aborted,
> so we leave a .XXXXXX file in BR2_DL_DIR.

 Ctrl-C is SIGINT so it can be trap'ped.

> 
> But even with the three-step dance above, there is still a small window
> for leaving out cruft in BR2_DL_DIR; but that window is much, much
> smaller than the download window.

 Ack.

> 
>>  Also, with this added complexity, the download helper scripts are becoming
>> quite similar. So wouldn't it be a good idea to refactor them?
> 
> I think we already had this discussion back when I introduced the series:
>     http://lists.busybox.net/pipermail/buildroot/2014-January/086826.html
> 
> But for now, there are a few important fixes I'd like be applied before
> going further.

 OK.


> 
>> I'm thinking of
>> something like this:
>>
>> support/download/helper:
>>
>> #!/bin/bash
>>
>> # We want to catch any command failure, and exit immediately
>> set -e
>>
>> # Generic download helper
>> # Call it with:
>> #   $1: specific download helper
>> #   $2: output file
>> #   ...: arguments for the specific download helper
>>
>> download="${1}"
>> output="${2}"
>> shift 2
>>
>> tmp_output="$( mktemp ... )"
>> if "${download}" "${tmp_output}" "$@"; then
>> 	# Blah blah need things to be atomic blah blah
>> 	mv "${tmp_output}" "${output}"
>> fi
> 
> Oh, you meant having a wrapper script above all the other helpers, and
> delegate to the helpers only the download, and handle the atomicity
> dance to the wrapper. Hmmm... Might be worth looking at, probably, yes.

 But if you have more urgent fixes, that's OK.

> 
>>  I expect there will be even more common stuff between the download helpers in
>> the future, so it makes sense to have this.
> 
> Well, for one, moving the hash check in that wrapper, for example...
> 
[snip]
>>> diff --git a/support/download/git b/support/download/git
>>> index badb491..b0031e5 100755
>>> --- a/support/download/git
>>> +++ b/support/download/git
>>> @@ -43,8 +43,8 @@ fi
>>>  
>>>  ret=1
>>>  pushd "${repodir}" >/dev/null
>>> -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \
>>> -                  --format=tar "${cset}"; then
>>> +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \
>>> +                  >"${tmp_tar}"; then
>>
>>  What's the reason for this change? I've checked with strace, and git archive
>> doesn't seem to unlink, it just does open(O_TRUNC) like cp.
> 
> Ah, so you did strace both? I straced scp, and it does unlink.
> Are we sure all cp and all tar we can encounter will all use
> open(O_TRUNC) ?

 Well, the unlink behaviour of mv is rather surprising if you ask me. Normally
when you create a file that already exists, you'd want it to retain the
attributes it has (owner, group, mod, acl, symlink, hardlinks).

> 
> It'd rather go on the safe side, and not assume much about this,
> especially since that behaviour may change in the future, or may not
> show in older versions, or different versions.

 OK, fair enough.

> 
>>>      if gzip -c "${tmp_tar}" >"${tmp_output}"; then
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>> diff --git a/support/download/hg b/support/download/hg
>>> index 6e9e26b..8b36db9 100755
>>> --- a/support/download/hg
>>> +++ b/support/download/hg
>>> @@ -35,7 +35,7 @@ ret=1
>>>  if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then
>>>      if ${HG} archive --repository "${repodir}" --type tgz \
>>>                       --prefix "${basename}" --rev "${cset}" \
>>> -                     "${tmp_output}"; then
>>> +                     - >"${tmp_output}"; then
>>
>>  I didn't check for hg, but I also don't expect it to unlink() first. In fact,
>> it's mv's behaviour of unlinking first that is surprising.
> 
> Well, scp also unlinks first.

 No it doesn't... Or else I'm doing something really wrong with my strace...


 Regards,
 Arnout

> 
> And same answer, I don;t want to rely on such a behaviour. Redirecting
> works the same everywhere, though.
> 
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>>      fi
>>> diff --git a/support/download/scp b/support/download/scp
>>> index d3aad43..e16ece5 100755
>>> --- a/support/download/scp
>>> +++ b/support/download/scp
>>> @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )"
>>>  
>>>  ret=1
>>>  if ${SCP} "${url}" "${tmp_dl}"; then
>>> -    if mv "${tmp_dl}" "${tmp_output}"; then
>>> +    if cat "${tmp_dl}" >"${tmp_output}"; then
>>>          mv "${tmp_output}" "${output}"
>>>          ret=0
>>>      fi
>>> diff --git a/support/download/svn b/support/download/svn
>>> index 39cbbcb..259d3ed 100755
>>> --- a/support/download/svn
>>> +++ b/support/download/svn
>>> @@ -33,7 +33,7 @@ rm -rf "${repodir}"
>>>  
>>>  ret=1
>>>  if ${SVN} export "${repo}@${rev}" "${repodir}"; then
>>> -    if tar czf "${tmp_output}" "${repodir}"; then
>>> +    if tar czf - "${repodir}" >"${tmp_output}"; then
>>
>>  Again, tar doesn't unlink so this isn't needed.
> 
> Ditto.
> 
> Regards,
> Yann E. MORIN.
>
Yann E. MORIN July 8, 2014, 9:52 p.m. UTC | #4
Arnout, All,

On 2014-07-08 18:42 +0200, Arnout Vandecappelle spake thusly:
> On 07/07/14 23:38, Yann E. MORIN wrote:
> > On 2014-07-07 08:11 +0200, Arnout Vandecappelle spake thusly:
[--SNIP--]
> >>  Not that it makes a real difference, but I think that 'cp' is a more natural
> >> way to do this.
> > 
> > I am not sure how cp handles copying over an existing file. I'll
> > check...
> 
>  It does. It only unlinks if open(O_TRUNC) fails. (Checked with strace for
> coreutils and my reading the source for busybox.)

I'll rework the entire series to take your and Jacmet's comments in
consideration.

Thanks for the reviews! :-)

Regards,
Yann E. MORIN.
Thomas Petazzoni July 9, 2014, 7:45 a.m. UTC | #5
Yann, Arnout,

On Mon, 7 Jul 2014 23:38:02 +0200, Yann E. MORIN wrote:

> >  Not really related to this patch, but why do we need this ${tmp_dl} to begin
> > with? Especially since we're already "occupying" a tempfile in DL_DIR anyway.
> 
> The idea was not to pollute BR2_DL_DIR, in case the download fails.
> Hence this dance:
>   - download to a disposable area (BUILD_DIR);
>   - move to a temp file in BR2_DL_DIR;
>   - atomically rename to the final file.

And also because $(DL_DIR) and $(BUILD_DIR) might be on different
filesystems, so the rename/move of the file from $(BUILD_DIR) (where it
was downloaded) to $(DL_DIR) may not be atomic. Hence the idea is to:

 - Download in $(BUILD_DIR)

 - Move to a temporary file in $(DL_DIR). This operation may not be
   atomic if $(DL_DIR) is not on the same filesystem as $(BUILD_DIR)

 - Finally rename the temporary file to the expected file name in
   $(DL_DIR). Since we're in the same directory, it's guaranteed to be
   atomic.

This allows to ensure that when the final file appears in $(DL_DIR),
we're sure the download is finished, and that therefore concurrently
executing Buildroot instances will either not see the downloaded file,
or see a fully completed downloaded file.

Or am I missing the point of the discussion here?

Thanks,

Thomas
Yann E. MORIN July 10, 2014, 3:59 p.m. UTC | #6
Thomas, All,

On 2014-07-09 09:45 +0200, Thomas Petazzoni spake thusly:
> On Mon, 7 Jul 2014 23:38:02 +0200, Yann E. MORIN wrote:
> 
> > >  Not really related to this patch, but why do we need this ${tmp_dl} to begin
> > > with? Especially since we're already "occupying" a tempfile in DL_DIR anyway.
> > 
> > The idea was not to pollute BR2_DL_DIR, in case the download fails.
> > Hence this dance:
> >   - download to a disposable area (BUILD_DIR);
> >   - move to a temp file in BR2_DL_DIR;
> >   - atomically rename to the final file.
> 
> And also because $(DL_DIR) and $(BUILD_DIR) might be on different
> filesystems, so the rename/move of the file from $(BUILD_DIR) (where it
> was downloaded) to $(DL_DIR) may not be atomic. Hence the idea is to:
> 
>  - Download in $(BUILD_DIR)
> 
>  - Move to a temporary file in $(DL_DIR). This operation may not be
>    atomic if $(DL_DIR) is not on the same filesystem as $(BUILD_DIR)
> 
>  - Finally rename the temporary file to the expected file name in
>    $(DL_DIR). Since we're in the same directory, it's guaranteed to be
>    atomic.
> 
> This allows to ensure that when the final file appears in $(DL_DIR),
> we're sure the download is finished, and that therefore concurrently
> executing Buildroot instances will either not see the downloaded file,
> or see a fully completed downloaded file.
> 
> Or am I missing the point of the discussion here?

Not as far as I'm concerned.

Except that Arnout's (and Peter's) concerns where that the way I
implemented that was not optimal, and I used convoluted constructs to
achieve this non-cluttering and atomicity.

I have to re-write my copy! ;-)

Regards,
Yann E. MORIN.
diff mbox

Patch

diff --git a/support/download/bzr b/support/download/bzr
index 19d837d..f86fa82 100755
--- a/support/download/bzr
+++ b/support/download/bzr
@@ -27,7 +27,7 @@  tmp_output="$( mktemp "${output}.XXXXXX" )"
 
 ret=1
 if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then
-    if mv "${tmp_dl}" "${tmp_output}"; then
+    if cat "${tmp_dl}" >"${tmp_output}"; then
         mv "${tmp_output}" "${output}"
         ret=0
     fi
diff --git a/support/download/cp b/support/download/cp
index 8f6bc06..e73159b 100755
--- a/support/download/cp
+++ b/support/download/cp
@@ -13,12 +13,15 @@  set -e
 source="${1}"
 output="${2}"
 
+tmp_dl="$( mktemp "${BUILD_DIR}/.XXXXXX" )"
 tmp_output="$( mktemp "${output}.XXXXXX" )"
 
 ret=1
-if ${LOCALFILES} "${source}" "${tmp_output}"; then
-    mv "${tmp_output}" "${output}"
-    ret=0
+if ${LOCALFILES} "${source}" "${tmp_dl}"; then
+    if cat "${tmp_dl}" >"${tmp_output}"; then
+        mv "${tmp_output}" "${output}"
+        ret=0
+    fi
 fi
 
 # Cleanup
diff --git a/support/download/cvs b/support/download/cvs
index 9aeed79..a8ab080 100755
--- a/support/download/cvs
+++ b/support/download/cvs
@@ -36,7 +36,7 @@  rm -rf "${repodir}"
 ret=1
 if ${CVS} -z3 -d":pserver:anonymous@${repo}" \
            co -d "${repodir}" -r ":${rev}" -P "${rawname}"; then
-    if tar czf "${tmp_output}" "${repodir}"; then
+    if tar czf - "${repodir}" >"${tmp_output}"; then
         mv "${tmp_output}" "${output}"
         ret=0
     fi
diff --git a/support/download/git b/support/download/git
index badb491..b0031e5 100755
--- a/support/download/git
+++ b/support/download/git
@@ -43,8 +43,8 @@  fi
 
 ret=1
 pushd "${repodir}" >/dev/null
-if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \
-                  --format=tar "${cset}"; then
+if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \
+                  >"${tmp_tar}"; then
     if gzip -c "${tmp_tar}" >"${tmp_output}"; then
         mv "${tmp_output}" "${output}"
         ret=0
diff --git a/support/download/hg b/support/download/hg
index 6e9e26b..8b36db9 100755
--- a/support/download/hg
+++ b/support/download/hg
@@ -35,7 +35,7 @@  ret=1
 if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then
     if ${HG} archive --repository "${repodir}" --type tgz \
                      --prefix "${basename}" --rev "${cset}" \
-                     "${tmp_output}"; then
+                     - >"${tmp_output}"; then
         mv "${tmp_output}" "${output}"
         ret=0
     fi
diff --git a/support/download/scp b/support/download/scp
index d3aad43..e16ece5 100755
--- a/support/download/scp
+++ b/support/download/scp
@@ -17,7 +17,7 @@  tmp_output="$( mktemp "${output}.XXXXXX" )"
 
 ret=1
 if ${SCP} "${url}" "${tmp_dl}"; then
-    if mv "${tmp_dl}" "${tmp_output}"; then
+    if cat "${tmp_dl}" >"${tmp_output}"; then
         mv "${tmp_output}" "${output}"
         ret=0
     fi
diff --git a/support/download/svn b/support/download/svn
index 39cbbcb..259d3ed 100755
--- a/support/download/svn
+++ b/support/download/svn
@@ -33,7 +33,7 @@  rm -rf "${repodir}"
 
 ret=1
 if ${SVN} export "${repo}@${rev}" "${repodir}"; then
-    if tar czf "${tmp_output}" "${repodir}"; then
+    if tar czf - "${repodir}" >"${tmp_output}"; then
         mv "${tmp_output}" "${output}"
         ret=0
     fi
diff --git a/support/download/wget b/support/download/wget
index cbeca3b..0f71108 100755
--- a/support/download/wget
+++ b/support/download/wget
@@ -25,7 +25,7 @@  tmp_output="$( mktemp "${output}.XXXXXX" )"
 
 ret=1
 if ${WGET} -O "${tmp_dl}" "${url}"; then
-    if mv "${tmp_dl}" "${tmp_output}"; then
+    if cat "${tmp_dl}" >"${tmp_output}"; then
         mv "${tmp_output}" "${output}"
         ret=0
     fi