Message ID | 3c2e529b6f9c542dba662b08b54f844f275e23ee.1404681878.git.yann.morin.1998@free.fr |
---|---|
State | Superseded |
Headers | show |
On 06/07/14 23:27, Yann E. MORIN wrote: > When using 'mv' to move files to temp files, the existing temp file is > forcibly removed by 'mv', and a new file is created. > > Although this should have little impact to us, there is still a race > conditions in case two parallel downloads would step on each other's > toes, and it goes like that: > > Process 1 Process 2 > mktemp tmp_output > mv tmp_dl tmp_output > removes tmp_output > mktemp tmp_ouput > writes to tmp_output > mv tmp_dl tmp_output > removes tmp_output > writes to tmp_output > mv tmp_output output > mv tmp_output output > > In this case, mktemp has the opportunity to create the same tmp_output > temporary file, and we trash the content from one with the content > from the other, and the last to do the final 'mv' breaks horibly. > > Instead, use 'cat', which guarantees that tmp_output is not removed > before writing to it. Not that it makes a real difference, but I think that 'cp' is a more natural way to do this. > > This complexifies a bit the local-files (cp) helper, but better safe > than sorry. > > (Note: this is a purely theoretical issue, I did not observe it yet, but > it is slated to happen one day or the other, and will be very hard to > debug then.) Actually, since the mktemp string is based on time and PID, the chance of this happening is really vanishingly small. That said, better safe than sorry. > > Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr> > --- > support/download/bzr | 2 +- > support/download/cp | 9 ++++++--- > support/download/cvs | 2 +- > support/download/git | 4 ++-- > support/download/hg | 2 +- > support/download/scp | 2 +- > support/download/svn | 2 +- > support/download/wget | 2 +- > 8 files changed, 14 insertions(+), 11 deletions(-) > > diff --git a/support/download/bzr b/support/download/bzr > index 19d837d..f86fa82 100755 > --- a/support/download/bzr > +++ b/support/download/bzr > @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" > > ret=1 > if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then > - if mv "${tmp_dl}" "${tmp_output}"; then > + if cat "${tmp_dl}" >"${tmp_output}"; then > mv "${tmp_output}" "${output}" > ret=0 > fi Not really related to this patch, but why do we need this ${tmp_dl} to begin with? Especially since we're already "occupying" a tempfile in DL_DIR anyway. Also, with this added complexity, the download helper scripts are becoming quite similar. So wouldn't it be a good idea to refactor them? I'm thinking of something like this: support/download/helper: #!/bin/bash # We want to catch any command failure, and exit immediately set -e # Generic download helper # Call it with: # $1: specific download helper # $2: output file # ...: arguments for the specific download helper download="${1}" output="${2}" shift 2 tmp_output="$( mktemp ... )" if "${download}" "${tmp_output}" "$@"; then # Blah blah need things to be atomic blah blah mv "${tmp_output}" "${output}" fi I expect there will be even more common stuff between the download helpers in the future, so it makes sense to have this. > diff --git a/support/download/cp b/support/download/cp > index 8f6bc06..e73159b 100755 > --- a/support/download/cp > +++ b/support/download/cp > @@ -13,12 +13,15 @@ set -e > source="${1}" > output="${2}" > > +tmp_dl="$( mktemp "${BUILD_DIR}/.XXXXXX" )" > tmp_output="$( mktemp "${output}.XXXXXX" )" > > ret=1 > -if ${LOCALFILES} "${source}" "${tmp_output}"; then > - mv "${tmp_output}" "${output}" > - ret=0 > +if ${LOCALFILES} "${source}" "${tmp_dl}"; then > + if cat "${tmp_dl}" >"${tmp_output}"; then Same here, what's the point of ${tmp_dl}? > + mv "${tmp_output}" "${output}" > + ret=0 > + fi > fi > > # Cleanup > diff --git a/support/download/cvs b/support/download/cvs > index 9aeed79..a8ab080 100755 > --- a/support/download/cvs > +++ b/support/download/cvs > @@ -36,7 +36,7 @@ rm -rf "${repodir}" > ret=1 > if ${CVS} -z3 -d":pserver:anonymous@${repo}" \ > co -d "${repodir}" -r ":${rev}" -P "${rawname}"; then > - if tar czf "${tmp_output}" "${repodir}"; then > + if tar czf - "${repodir}" >"${tmp_output}"; then > mv "${tmp_output}" "${output}" > ret=0 > fi > diff --git a/support/download/git b/support/download/git > index badb491..b0031e5 100755 > --- a/support/download/git > +++ b/support/download/git > @@ -43,8 +43,8 @@ fi > > ret=1 > pushd "${repodir}" >/dev/null > -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \ > - --format=tar "${cset}"; then > +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \ > + >"${tmp_tar}"; then What's the reason for this change? I've checked with strace, and git archive doesn't seem to unlink, it just does open(O_TRUNC) like cp. > if gzip -c "${tmp_tar}" >"${tmp_output}"; then > mv "${tmp_output}" "${output}" > ret=0 > diff --git a/support/download/hg b/support/download/hg > index 6e9e26b..8b36db9 100755 > --- a/support/download/hg > +++ b/support/download/hg > @@ -35,7 +35,7 @@ ret=1 > if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then > if ${HG} archive --repository "${repodir}" --type tgz \ > --prefix "${basename}" --rev "${cset}" \ > - "${tmp_output}"; then > + - >"${tmp_output}"; then I didn't check for hg, but I also don't expect it to unlink() first. In fact, it's mv's behaviour of unlinking first that is surprising. > mv "${tmp_output}" "${output}" > ret=0 > fi > diff --git a/support/download/scp b/support/download/scp > index d3aad43..e16ece5 100755 > --- a/support/download/scp > +++ b/support/download/scp > @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" > > ret=1 > if ${SCP} "${url}" "${tmp_dl}"; then > - if mv "${tmp_dl}" "${tmp_output}"; then > + if cat "${tmp_dl}" >"${tmp_output}"; then > mv "${tmp_output}" "${output}" > ret=0 > fi > diff --git a/support/download/svn b/support/download/svn > index 39cbbcb..259d3ed 100755 > --- a/support/download/svn > +++ b/support/download/svn > @@ -33,7 +33,7 @@ rm -rf "${repodir}" > > ret=1 > if ${SVN} export "${repo}@${rev}" "${repodir}"; then > - if tar czf "${tmp_output}" "${repodir}"; then > + if tar czf - "${repodir}" >"${tmp_output}"; then Again, tar doesn't unlink so this isn't needed. Regards, Arnout > mv "${tmp_output}" "${output}" > ret=0 > fi > diff --git a/support/download/wget b/support/download/wget > index cbeca3b..0f71108 100755 > --- a/support/download/wget > +++ b/support/download/wget > @@ -25,7 +25,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" > > ret=1 > if ${WGET} -O "${tmp_dl}" "${url}"; then > - if mv "${tmp_dl}" "${tmp_output}"; then > + if cat "${tmp_dl}" >"${tmp_output}"; then > mv "${tmp_output}" "${output}" > ret=0 > fi >
Arnout, All, On 2014-07-07 08:11 +0200, Arnout Vandecappelle spake thusly: > On 06/07/14 23:27, Yann E. MORIN wrote: > > When using 'mv' to move files to temp files, the existing temp file is > > forcibly removed by 'mv', and a new file is created. > > > > Although this should have little impact to us, there is still a race > > conditions in case two parallel downloads would step on each other's > > toes, and it goes like that: > > > > Process 1 Process 2 > > mktemp tmp_output > > mv tmp_dl tmp_output > > removes tmp_output > > mktemp tmp_ouput > > writes to tmp_output > > mv tmp_dl tmp_output > > removes tmp_output > > writes to tmp_output > > mv tmp_output output > > mv tmp_output output > > > > In this case, mktemp has the opportunity to create the same tmp_output > > temporary file, and we trash the content from one with the content > > from the other, and the last to do the final 'mv' breaks horibly. > > > > Instead, use 'cat', which guarantees that tmp_output is not removed > > before writing to it. > > Not that it makes a real difference, but I think that 'cp' is a more natural > way to do this. I am not sure how cp handles copying over an existing file. I'll check... > > This complexifies a bit the local-files (cp) helper, but better safe > > than sorry. > > > > (Note: this is a purely theoretical issue, I did not observe it yet, but > > it is slated to happen one day or the other, and will be very hard to > > debug then.) > > Actually, since the mktemp string is based on time and PID, the chance of this > happening is really vanishingly small. That said, better safe than sorry. Yes, the opportunity for a collision is very low, but it's not impossible. After all, that's the problem with race conditions: given sufficient time and trials, you'll hit it, and it is very hard to debug, especially in this case, since the condition is not reproducible (random names.) > > Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr> > > --- > > support/download/bzr | 2 +- > > support/download/cp | 9 ++++++--- > > support/download/cvs | 2 +- > > support/download/git | 4 ++-- > > support/download/hg | 2 +- > > support/download/scp | 2 +- > > support/download/svn | 2 +- > > support/download/wget | 2 +- > > 8 files changed, 14 insertions(+), 11 deletions(-) > > > > diff --git a/support/download/bzr b/support/download/bzr > > index 19d837d..f86fa82 100755 > > --- a/support/download/bzr > > +++ b/support/download/bzr > > @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" > > > > ret=1 > > if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then > > - if mv "${tmp_dl}" "${tmp_output}"; then > > + if cat "${tmp_dl}" >"${tmp_output}"; then > > mv "${tmp_output}" "${output}" > > ret=0 > > fi > > Not really related to this patch, but why do we need this ${tmp_dl} to begin > with? Especially since we're already "occupying" a tempfile in DL_DIR anyway. The idea was not to pollute BR2_DL_DIR, in case the download fails. Hence this dance: - download to a disposable area (BUILD_DIR); - move to a temp file in BR2_DL_DIR; - atomically rename to the final file. But granted, since we remove failed downloads, we can skip the intermediary temp file in BUILD_DIR (although I still do not like much the idea of polluting BR2_DL_DIR.) I think there is still a case for which we can't remove the temp file... Ah yes, if the user cancels a build with Ctrl-C, the script is aborted, so we leave a .XXXXXX file in BR2_DL_DIR. But even with the three-step dance above, there is still a small window for leaving out cruft in BR2_DL_DIR; but that window is much, much smaller than the download window. > Also, with this added complexity, the download helper scripts are becoming > quite similar. So wouldn't it be a good idea to refactor them? I think we already had this discussion back when I introduced the series: http://lists.busybox.net/pipermail/buildroot/2014-January/086826.html But for now, there are a few important fixes I'd like be applied before going further. > I'm thinking of > something like this: > > support/download/helper: > > #!/bin/bash > > # We want to catch any command failure, and exit immediately > set -e > > # Generic download helper > # Call it with: > # $1: specific download helper > # $2: output file > # ...: arguments for the specific download helper > > download="${1}" > output="${2}" > shift 2 > > tmp_output="$( mktemp ... )" > if "${download}" "${tmp_output}" "$@"; then > # Blah blah need things to be atomic blah blah > mv "${tmp_output}" "${output}" > fi Oh, you meant having a wrapper script above all the other helpers, and delegate to the helpers only the download, and handle the atomicity dance to the wrapper. Hmmm... Might be worth looking at, probably, yes. > I expect there will be even more common stuff between the download helpers in > the future, so it makes sense to have this. Well, for one, moving the hash check in that wrapper, for example... > > diff --git a/support/download/cp b/support/download/cp > > index 8f6bc06..e73159b 100755 > > --- a/support/download/cp > > +++ b/support/download/cp > > @@ -13,12 +13,15 @@ set -e > > source="${1}" > > output="${2}" > > > > +tmp_dl="$( mktemp "${BUILD_DIR}/.XXXXXX" )" > > tmp_output="$( mktemp "${output}.XXXXXX" )" > > > > ret=1 > > -if ${LOCALFILES} "${source}" "${tmp_output}"; then > > - mv "${tmp_output}" "${output}" > > - ret=0 > > +if ${LOCALFILES} "${source}" "${tmp_dl}"; then > > + if cat "${tmp_dl}" >"${tmp_output}"; then > > Same here, what's the point of ${tmp_dl}? Same answer. > > + mv "${tmp_output}" "${output}" > > + ret=0 > > + fi > > fi > > > > # Cleanup > > diff --git a/support/download/cvs b/support/download/cvs > > index 9aeed79..a8ab080 100755 > > --- a/support/download/cvs > > +++ b/support/download/cvs > > @@ -36,7 +36,7 @@ rm -rf "${repodir}" > > ret=1 > > if ${CVS} -z3 -d":pserver:anonymous@${repo}" \ > > co -d "${repodir}" -r ":${rev}" -P "${rawname}"; then > > - if tar czf "${tmp_output}" "${repodir}"; then > > + if tar czf - "${repodir}" >"${tmp_output}"; then > > mv "${tmp_output}" "${output}" > > ret=0 > > fi > > diff --git a/support/download/git b/support/download/git > > index badb491..b0031e5 100755 > > --- a/support/download/git > > +++ b/support/download/git > > @@ -43,8 +43,8 @@ fi > > > > ret=1 > > pushd "${repodir}" >/dev/null > > -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \ > > - --format=tar "${cset}"; then > > +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \ > > + >"${tmp_tar}"; then > > What's the reason for this change? I've checked with strace, and git archive > doesn't seem to unlink, it just does open(O_TRUNC) like cp. Ah, so you did strace both? I straced scp, and it does unlink. Are we sure all cp and all tar we can encounter will all use open(O_TRUNC) ? It'd rather go on the safe side, and not assume much about this, especially since that behaviour may change in the future, or may not show in older versions, or different versions. > > if gzip -c "${tmp_tar}" >"${tmp_output}"; then > > mv "${tmp_output}" "${output}" > > ret=0 > > diff --git a/support/download/hg b/support/download/hg > > index 6e9e26b..8b36db9 100755 > > --- a/support/download/hg > > +++ b/support/download/hg > > @@ -35,7 +35,7 @@ ret=1 > > if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then > > if ${HG} archive --repository "${repodir}" --type tgz \ > > --prefix "${basename}" --rev "${cset}" \ > > - "${tmp_output}"; then > > + - >"${tmp_output}"; then > > I didn't check for hg, but I also don't expect it to unlink() first. In fact, > it's mv's behaviour of unlinking first that is surprising. Well, scp also unlinks first. And same answer, I don;t want to rely on such a behaviour. Redirecting works the same everywhere, though. > > mv "${tmp_output}" "${output}" > > ret=0 > > fi > > diff --git a/support/download/scp b/support/download/scp > > index d3aad43..e16ece5 100755 > > --- a/support/download/scp > > +++ b/support/download/scp > > @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" > > > > ret=1 > > if ${SCP} "${url}" "${tmp_dl}"; then > > - if mv "${tmp_dl}" "${tmp_output}"; then > > + if cat "${tmp_dl}" >"${tmp_output}"; then > > mv "${tmp_output}" "${output}" > > ret=0 > > fi > > diff --git a/support/download/svn b/support/download/svn > > index 39cbbcb..259d3ed 100755 > > --- a/support/download/svn > > +++ b/support/download/svn > > @@ -33,7 +33,7 @@ rm -rf "${repodir}" > > > > ret=1 > > if ${SVN} export "${repo}@${rev}" "${repodir}"; then > > - if tar czf "${tmp_output}" "${repodir}"; then > > + if tar czf - "${repodir}" >"${tmp_output}"; then > > Again, tar doesn't unlink so this isn't needed. Ditto. Regards, Yann E. MORIN.
On 07/07/14 23:38, Yann E. MORIN wrote: > Arnout, All, > > On 2014-07-07 08:11 +0200, Arnout Vandecappelle spake thusly: >> On 06/07/14 23:27, Yann E. MORIN wrote: >>> When using 'mv' to move files to temp files, the existing temp file is >>> forcibly removed by 'mv', and a new file is created. >>> >>> Although this should have little impact to us, there is still a race >>> conditions in case two parallel downloads would step on each other's >>> toes, and it goes like that: >>> >>> Process 1 Process 2 >>> mktemp tmp_output >>> mv tmp_dl tmp_output >>> removes tmp_output >>> mktemp tmp_ouput >>> writes to tmp_output >>> mv tmp_dl tmp_output >>> removes tmp_output >>> writes to tmp_output >>> mv tmp_output output >>> mv tmp_output output >>> >>> In this case, mktemp has the opportunity to create the same tmp_output >>> temporary file, and we trash the content from one with the content >>> from the other, and the last to do the final 'mv' breaks horibly. >>> >>> Instead, use 'cat', which guarantees that tmp_output is not removed >>> before writing to it. >> >> Not that it makes a real difference, but I think that 'cp' is a more natural >> way to do this. > > I am not sure how cp handles copying over an existing file. I'll > check... It does. It only unlinks if open(O_TRUNC) fails. (Checked with strace for coreutils and my reading the source for busybox.) > >>> This complexifies a bit the local-files (cp) helper, but better safe >>> than sorry. >>> >>> (Note: this is a purely theoretical issue, I did not observe it yet, but >>> it is slated to happen one day or the other, and will be very hard to >>> debug then.) >> >> Actually, since the mktemp string is based on time and PID, the chance of this >> happening is really vanishingly small. That said, better safe than sorry. > > Yes, the opportunity for a collision is very low, but it's not > impossible. After all, that's the problem with race conditions: given > sufficient time and trials, you'll hit it, and it is very hard to debug, > especially in this case, since the condition is not reproducible (random > names.) Well, since mktemp uses /dev/urandom to generate the file pattern (I was wrong about time/pid, that's only if urandom is not available), the 6-character random number has a collision frequency of about 10^10. The chance that such a collision occurs at exactly the moment that we're doing two parallel downloads of the same source is pretty darn small - much smaller than the chance of a sha1 collision I think. > >>> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr> >>> --- >>> support/download/bzr | 2 +- >>> support/download/cp | 9 ++++++--- >>> support/download/cvs | 2 +- >>> support/download/git | 4 ++-- >>> support/download/hg | 2 +- >>> support/download/scp | 2 +- >>> support/download/svn | 2 +- >>> support/download/wget | 2 +- >>> 8 files changed, 14 insertions(+), 11 deletions(-) >>> >>> diff --git a/support/download/bzr b/support/download/bzr >>> index 19d837d..f86fa82 100755 >>> --- a/support/download/bzr >>> +++ b/support/download/bzr >>> @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" >>> >>> ret=1 >>> if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then >>> - if mv "${tmp_dl}" "${tmp_output}"; then >>> + if cat "${tmp_dl}" >"${tmp_output}"; then >>> mv "${tmp_output}" "${output}" >>> ret=0 >>> fi >> >> Not really related to this patch, but why do we need this ${tmp_dl} to begin >> with? Especially since we're already "occupying" a tempfile in DL_DIR anyway. > > The idea was not to pollute BR2_DL_DIR, in case the download fails. > Hence this dance: > - download to a disposable area (BUILD_DIR); > - move to a temp file in BR2_DL_DIR; > - atomically rename to the final file. > > But granted, since we remove failed downloads, we can skip the > intermediary temp file in BUILD_DIR (although I still do not like much > the idea of polluting BR2_DL_DIR.) Well, now you pollute BR2_DL_DIR with ${tmp_output} instead of ${tmp_dl}. Oh, now I get it: if a download is interrupted, you'll leave the partial download in BUILD_DIR instead of DL_DIR. OK, sorry about the noise. > > I think there is still a case for which we can't remove the temp file... > Ah yes, if the user cancels a build with Ctrl-C, the script is aborted, > so we leave a .XXXXXX file in BR2_DL_DIR. Ctrl-C is SIGINT so it can be trap'ped. > > But even with the three-step dance above, there is still a small window > for leaving out cruft in BR2_DL_DIR; but that window is much, much > smaller than the download window. Ack. > >> Also, with this added complexity, the download helper scripts are becoming >> quite similar. So wouldn't it be a good idea to refactor them? > > I think we already had this discussion back when I introduced the series: > http://lists.busybox.net/pipermail/buildroot/2014-January/086826.html > > But for now, there are a few important fixes I'd like be applied before > going further. OK. > >> I'm thinking of >> something like this: >> >> support/download/helper: >> >> #!/bin/bash >> >> # We want to catch any command failure, and exit immediately >> set -e >> >> # Generic download helper >> # Call it with: >> # $1: specific download helper >> # $2: output file >> # ...: arguments for the specific download helper >> >> download="${1}" >> output="${2}" >> shift 2 >> >> tmp_output="$( mktemp ... )" >> if "${download}" "${tmp_output}" "$@"; then >> # Blah blah need things to be atomic blah blah >> mv "${tmp_output}" "${output}" >> fi > > Oh, you meant having a wrapper script above all the other helpers, and > delegate to the helpers only the download, and handle the atomicity > dance to the wrapper. Hmmm... Might be worth looking at, probably, yes. But if you have more urgent fixes, that's OK. > >> I expect there will be even more common stuff between the download helpers in >> the future, so it makes sense to have this. > > Well, for one, moving the hash check in that wrapper, for example... > [snip] >>> diff --git a/support/download/git b/support/download/git >>> index badb491..b0031e5 100755 >>> --- a/support/download/git >>> +++ b/support/download/git >>> @@ -43,8 +43,8 @@ fi >>> >>> ret=1 >>> pushd "${repodir}" >/dev/null >>> -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \ >>> - --format=tar "${cset}"; then >>> +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \ >>> + >"${tmp_tar}"; then >> >> What's the reason for this change? I've checked with strace, and git archive >> doesn't seem to unlink, it just does open(O_TRUNC) like cp. > > Ah, so you did strace both? I straced scp, and it does unlink. > Are we sure all cp and all tar we can encounter will all use > open(O_TRUNC) ? Well, the unlink behaviour of mv is rather surprising if you ask me. Normally when you create a file that already exists, you'd want it to retain the attributes it has (owner, group, mod, acl, symlink, hardlinks). > > It'd rather go on the safe side, and not assume much about this, > especially since that behaviour may change in the future, or may not > show in older versions, or different versions. OK, fair enough. > >>> if gzip -c "${tmp_tar}" >"${tmp_output}"; then >>> mv "${tmp_output}" "${output}" >>> ret=0 >>> diff --git a/support/download/hg b/support/download/hg >>> index 6e9e26b..8b36db9 100755 >>> --- a/support/download/hg >>> +++ b/support/download/hg >>> @@ -35,7 +35,7 @@ ret=1 >>> if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then >>> if ${HG} archive --repository "${repodir}" --type tgz \ >>> --prefix "${basename}" --rev "${cset}" \ >>> - "${tmp_output}"; then >>> + - >"${tmp_output}"; then >> >> I didn't check for hg, but I also don't expect it to unlink() first. In fact, >> it's mv's behaviour of unlinking first that is surprising. > > Well, scp also unlinks first. No it doesn't... Or else I'm doing something really wrong with my strace... Regards, Arnout > > And same answer, I don;t want to rely on such a behaviour. Redirecting > works the same everywhere, though. > >>> mv "${tmp_output}" "${output}" >>> ret=0 >>> fi >>> diff --git a/support/download/scp b/support/download/scp >>> index d3aad43..e16ece5 100755 >>> --- a/support/download/scp >>> +++ b/support/download/scp >>> @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" >>> >>> ret=1 >>> if ${SCP} "${url}" "${tmp_dl}"; then >>> - if mv "${tmp_dl}" "${tmp_output}"; then >>> + if cat "${tmp_dl}" >"${tmp_output}"; then >>> mv "${tmp_output}" "${output}" >>> ret=0 >>> fi >>> diff --git a/support/download/svn b/support/download/svn >>> index 39cbbcb..259d3ed 100755 >>> --- a/support/download/svn >>> +++ b/support/download/svn >>> @@ -33,7 +33,7 @@ rm -rf "${repodir}" >>> >>> ret=1 >>> if ${SVN} export "${repo}@${rev}" "${repodir}"; then >>> - if tar czf "${tmp_output}" "${repodir}"; then >>> + if tar czf - "${repodir}" >"${tmp_output}"; then >> >> Again, tar doesn't unlink so this isn't needed. > > Ditto. > > Regards, > Yann E. MORIN. >
Arnout, All, On 2014-07-08 18:42 +0200, Arnout Vandecappelle spake thusly: > On 07/07/14 23:38, Yann E. MORIN wrote: > > On 2014-07-07 08:11 +0200, Arnout Vandecappelle spake thusly: [--SNIP--] > >> Not that it makes a real difference, but I think that 'cp' is a more natural > >> way to do this. > > > > I am not sure how cp handles copying over an existing file. I'll > > check... > > It does. It only unlinks if open(O_TRUNC) fails. (Checked with strace for > coreutils and my reading the source for busybox.) I'll rework the entire series to take your and Jacmet's comments in consideration. Thanks for the reviews! :-) Regards, Yann E. MORIN.
Yann, Arnout, On Mon, 7 Jul 2014 23:38:02 +0200, Yann E. MORIN wrote: > > Not really related to this patch, but why do we need this ${tmp_dl} to begin > > with? Especially since we're already "occupying" a tempfile in DL_DIR anyway. > > The idea was not to pollute BR2_DL_DIR, in case the download fails. > Hence this dance: > - download to a disposable area (BUILD_DIR); > - move to a temp file in BR2_DL_DIR; > - atomically rename to the final file. And also because $(DL_DIR) and $(BUILD_DIR) might be on different filesystems, so the rename/move of the file from $(BUILD_DIR) (where it was downloaded) to $(DL_DIR) may not be atomic. Hence the idea is to: - Download in $(BUILD_DIR) - Move to a temporary file in $(DL_DIR). This operation may not be atomic if $(DL_DIR) is not on the same filesystem as $(BUILD_DIR) - Finally rename the temporary file to the expected file name in $(DL_DIR). Since we're in the same directory, it's guaranteed to be atomic. This allows to ensure that when the final file appears in $(DL_DIR), we're sure the download is finished, and that therefore concurrently executing Buildroot instances will either not see the downloaded file, or see a fully completed downloaded file. Or am I missing the point of the discussion here? Thanks, Thomas
Thomas, All, On 2014-07-09 09:45 +0200, Thomas Petazzoni spake thusly: > On Mon, 7 Jul 2014 23:38:02 +0200, Yann E. MORIN wrote: > > > > Not really related to this patch, but why do we need this ${tmp_dl} to begin > > > with? Especially since we're already "occupying" a tempfile in DL_DIR anyway. > > > > The idea was not to pollute BR2_DL_DIR, in case the download fails. > > Hence this dance: > > - download to a disposable area (BUILD_DIR); > > - move to a temp file in BR2_DL_DIR; > > - atomically rename to the final file. > > And also because $(DL_DIR) and $(BUILD_DIR) might be on different > filesystems, so the rename/move of the file from $(BUILD_DIR) (where it > was downloaded) to $(DL_DIR) may not be atomic. Hence the idea is to: > > - Download in $(BUILD_DIR) > > - Move to a temporary file in $(DL_DIR). This operation may not be > atomic if $(DL_DIR) is not on the same filesystem as $(BUILD_DIR) > > - Finally rename the temporary file to the expected file name in > $(DL_DIR). Since we're in the same directory, it's guaranteed to be > atomic. > > This allows to ensure that when the final file appears in $(DL_DIR), > we're sure the download is finished, and that therefore concurrently > executing Buildroot instances will either not see the downloaded file, > or see a fully completed downloaded file. > > Or am I missing the point of the discussion here? Not as far as I'm concerned. Except that Arnout's (and Peter's) concerns where that the way I implemented that was not optimal, and I used convoluted constructs to achieve this non-cluttering and atomicity. I have to re-write my copy! ;-) Regards, Yann E. MORIN.
diff --git a/support/download/bzr b/support/download/bzr index 19d837d..f86fa82 100755 --- a/support/download/bzr +++ b/support/download/bzr @@ -27,7 +27,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" ret=1 if ${BZR} export --format=tgz "${tmp_dl}" "${repo}" -r "${rev}"; then - if mv "${tmp_dl}" "${tmp_output}"; then + if cat "${tmp_dl}" >"${tmp_output}"; then mv "${tmp_output}" "${output}" ret=0 fi diff --git a/support/download/cp b/support/download/cp index 8f6bc06..e73159b 100755 --- a/support/download/cp +++ b/support/download/cp @@ -13,12 +13,15 @@ set -e source="${1}" output="${2}" +tmp_dl="$( mktemp "${BUILD_DIR}/.XXXXXX" )" tmp_output="$( mktemp "${output}.XXXXXX" )" ret=1 -if ${LOCALFILES} "${source}" "${tmp_output}"; then - mv "${tmp_output}" "${output}" - ret=0 +if ${LOCALFILES} "${source}" "${tmp_dl}"; then + if cat "${tmp_dl}" >"${tmp_output}"; then + mv "${tmp_output}" "${output}" + ret=0 + fi fi # Cleanup diff --git a/support/download/cvs b/support/download/cvs index 9aeed79..a8ab080 100755 --- a/support/download/cvs +++ b/support/download/cvs @@ -36,7 +36,7 @@ rm -rf "${repodir}" ret=1 if ${CVS} -z3 -d":pserver:anonymous@${repo}" \ co -d "${repodir}" -r ":${rev}" -P "${rawname}"; then - if tar czf "${tmp_output}" "${repodir}"; then + if tar czf - "${repodir}" >"${tmp_output}"; then mv "${tmp_output}" "${output}" ret=0 fi diff --git a/support/download/git b/support/download/git index badb491..b0031e5 100755 --- a/support/download/git +++ b/support/download/git @@ -43,8 +43,8 @@ fi ret=1 pushd "${repodir}" >/dev/null -if ${GIT} archive --prefix="${basename}/" -o "${tmp_tar}" \ - --format=tar "${cset}"; then +if ${GIT} archive --prefix="${basename}/" --format=tar "${cset}" \ + >"${tmp_tar}"; then if gzip -c "${tmp_tar}" >"${tmp_output}"; then mv "${tmp_output}" "${output}" ret=0 diff --git a/support/download/hg b/support/download/hg index 6e9e26b..8b36db9 100755 --- a/support/download/hg +++ b/support/download/hg @@ -35,7 +35,7 @@ ret=1 if ${HG} clone --noupdate --rev "${cset}" "${repo}" "${repodir}"; then if ${HG} archive --repository "${repodir}" --type tgz \ --prefix "${basename}" --rev "${cset}" \ - "${tmp_output}"; then + - >"${tmp_output}"; then mv "${tmp_output}" "${output}" ret=0 fi diff --git a/support/download/scp b/support/download/scp index d3aad43..e16ece5 100755 --- a/support/download/scp +++ b/support/download/scp @@ -17,7 +17,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" ret=1 if ${SCP} "${url}" "${tmp_dl}"; then - if mv "${tmp_dl}" "${tmp_output}"; then + if cat "${tmp_dl}" >"${tmp_output}"; then mv "${tmp_output}" "${output}" ret=0 fi diff --git a/support/download/svn b/support/download/svn index 39cbbcb..259d3ed 100755 --- a/support/download/svn +++ b/support/download/svn @@ -33,7 +33,7 @@ rm -rf "${repodir}" ret=1 if ${SVN} export "${repo}@${rev}" "${repodir}"; then - if tar czf "${tmp_output}" "${repodir}"; then + if tar czf - "${repodir}" >"${tmp_output}"; then mv "${tmp_output}" "${output}" ret=0 fi diff --git a/support/download/wget b/support/download/wget index cbeca3b..0f71108 100755 --- a/support/download/wget +++ b/support/download/wget @@ -25,7 +25,7 @@ tmp_output="$( mktemp "${output}.XXXXXX" )" ret=1 if ${WGET} -O "${tmp_dl}" "${url}"; then - if mv "${tmp_dl}" "${tmp_output}"; then + if cat "${tmp_dl}" >"${tmp_output}"; then mv "${tmp_output}" "${output}" ret=0 fi
When using 'mv' to move files to temp files, the existing temp file is forcibly removed by 'mv', and a new file is created. Although this should have little impact to us, there is still a race conditions in case two parallel downloads would step on each other's toes, and it goes like that: Process 1 Process 2 mktemp tmp_output mv tmp_dl tmp_output removes tmp_output mktemp tmp_ouput writes to tmp_output mv tmp_dl tmp_output removes tmp_output writes to tmp_output mv tmp_output output mv tmp_output output In this case, mktemp has the opportunity to create the same tmp_output temporary file, and we trash the content from one with the content from the other, and the last to do the final 'mv' breaks horibly. Instead, use 'cat', which guarantees that tmp_output is not removed before writing to it. This complexifies a bit the local-files (cp) helper, but better safe than sorry. (Note: this is a purely theoretical issue, I did not observe it yet, but it is slated to happen one day or the other, and will be very hard to debug then.) Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr> --- support/download/bzr | 2 +- support/download/cp | 9 ++++++--- support/download/cvs | 2 +- support/download/git | 4 ++-- support/download/hg | 2 +- support/download/scp | 2 +- support/download/svn | 2 +- support/download/wget | 2 +- 8 files changed, 14 insertions(+), 11 deletions(-)