diff mbox

pkg-download: fix fetching URLs with ? from PRIMARY/SECONDARY

Message ID 20161021202715.16528-1-arnout@mind.be
State Accepted
Headers show

Commit Message

Arnout Vandecappelle Oct. 21, 2016, 8:27 p.m. UTC
Some packages download files (especially patches) with a ? in the
URL. The ? marks the query part of the URL. However, the downloaded
file still contains the ? but from then on it doesn't designate a
query part anymore. Therefore, when fetching from PRIMARY or
SECONDARY site over http, the server will report a 404 Not Found.

To fix, we need to replace the ? with %3F. Obviously, this should
be done only when fetching from PRIMARY or SECONDARY. For fetching
from the real upstream, the ? really does designate the query part.

Fixes #9371.

Reported-by: Johan Derycke <johanderycke@gmail.com>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
---
I tested with
make BR2_PRIMARY_SITE=http://localhost/dl BR2_PRIMARY_SITE_ONLY=y BR2_DL_DIR=/tmp/dl openssl
and also checked with a file:// primary site, with secondary site, and
openssl-source-check
---
 package/pkg-download.mk | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Thomas Petazzoni Oct. 22, 2016, 1:04 p.m. UTC | #1
Hello,

There is nothing named "SECONDARY" in Buildroot, it's called "BACKUP".
This can be fixed when applying the patch.

On Fri, 21 Oct 2016 22:27:15 +0200, Arnout Vandecappelle
(Essensium/Mind) wrote:
> Some packages download files (especially patches) with a ? in the
> URL. The ? marks the query part of the URL. However, the downloaded
> file still contains the ? but from then on it doesn't designate a
> query part anymore. Therefore, when fetching from PRIMARY or
> SECONDARY site over http, the server will report a 404 Not Found.
> 
> To fix, we need to replace the ? with %3F. Obviously, this should
> be done only when fetching from PRIMARY or SECONDARY. For fetching
> from the real upstream, the ? really does designate the query part.

One thing I'm wondering is if the '?' character is the only one that
needs to be escaped like this, or not.

Thanks,

Thomas
Peter Korsgaard Oct. 23, 2016, 9:09 a.m. UTC | #2
>>>>> "Thomas" == Thomas Petazzoni <thomas.petazzoni@free-electrons.com> writes:

 > Hello,
 > There is nothing named "SECONDARY" in Buildroot, it's called "BACKUP".
 > This can be fixed when applying the patch.

 > On Fri, 21 Oct 2016 22:27:15 +0200, Arnout Vandecappelle
 > (Essensium/Mind) wrote:
 >> Some packages download files (especially patches) with a ? in the
 >> URL. The ? marks the query part of the URL. However, the downloaded
 >> file still contains the ? but from then on it doesn't designate a
 >> query part anymore. Therefore, when fetching from PRIMARY or
 >> SECONDARY site over http, the server will report a 404 Not Found.
 >> 
 >> To fix, we need to replace the ? with %3F. Obviously, this should
 >> be done only when fetching from PRIMARY or SECONDARY. For fetching
 >> from the real upstream, the ? really does designate the query part.

 > One thing I'm wondering is if the '?' character is the only one that
 > needs to be escaped like this, or not.

I believe it is, as we already quote the URLs correctly so they don't
confuse the shell and wget afaik correctly URL encodes URL before
requesting the location from the web server.

It works here at least with lighttpd:

echo ok > '/var/www/url?with&special!characters in'

wget 'http://127.0.0.1/url?with&special!characters in'
--2016-10-23 11:04:49--  http://127.0.0.1/url?with&special!characters%20in
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-10-23 11:04:49 ERROR 404: Not Found.

wget 'http://127.0.0.1/url%3fwith&special!characters in'
--2016-10-23 11:04:27--  http://127.0.0.1/url%3fwith&special!characters%20in
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3 [application/octet-stream]
Saving to: ‘url?with&special!characters in’

2016-10-23 11:04:27 (316 KB/s) - ‘url?with&special!characters in’ saved [3/3]


We have seen this problem on sources.buildroot.org as well. I have so
far manually fixed it up by renaming the files to drop ?.., so that will
need to be undone when this gets applied.
Thomas Petazzoni Oct. 23, 2016, 12:54 p.m. UTC | #3
Hello,

On Fri, 21 Oct 2016 22:27:15 +0200, Arnout Vandecappelle
(Essensium/Mind) wrote:
> Some packages download files (especially patches) with a ? in the
> URL. The ? marks the query part of the URL. However, the downloaded
> file still contains the ? but from then on it doesn't designate a
> query part anymore. Therefore, when fetching from PRIMARY or
> SECONDARY site over http, the server will report a 404 Not Found.
> 
> To fix, we need to replace the ? with %3F. Obviously, this should
> be done only when fetching from PRIMARY or SECONDARY. For fetching
> from the real upstream, the ? really does designate the query part.
> 
> Fixes #9371.
> 
> Reported-by: Johan Derycke <johanderycke@gmail.com>
> Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
> ---
> I tested with
> make BR2_PRIMARY_SITE=http://localhost/dl BR2_PRIMARY_SITE_ONLY=y BR2_DL_DIR=/tmp/dl openssl
> and also checked with a file:// primary site, with secondary site, and
> openssl-source-check
> ---
>  package/pkg-download.mk | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Applied to master, after s/SECONDARY/BACKUP/.

Peter can you take the appropriate actions in sources.b.o ?

Thanks!

Thomas
Arnout Vandecappelle Oct. 23, 2016, 3:21 p.m. UTC | #4
On 22-10-16 15:04, Thomas Petazzoni wrote:
> Hello,
> 
> There is nothing named "SECONDARY" in Buildroot, it's called "BACKUP".
> This can be fixed when applying the patch.
> 
> On Fri, 21 Oct 2016 22:27:15 +0200, Arnout Vandecappelle
> (Essensium/Mind) wrote:
>> Some packages download files (especially patches) with a ? in the
>> URL. The ? marks the query part of the URL. However, the downloaded
>> file still contains the ? but from then on it doesn't designate a
>> query part anymore. Therefore, when fetching from PRIMARY or
>> SECONDARY site over http, the server will report a 404 Not Found.
>>
>> To fix, we need to replace the ? with %3F. Obviously, this should
>> be done only when fetching from PRIMARY or SECONDARY. For fetching
>> from the real upstream, the ? really does designate the query part.
> 
> One thing I'm wondering is if the '?' character is the only one that
> needs to be escaped like this, or not.

 From wikipedia, an URL is:

scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

so after the path, only ? and # are interpreted differently between the normal
download and the PRIMARY/BACKUP download. You might think that & (or ;) is also
special, but those characters are only special when they appear _after_ '?'.
Just to be sure I tested a ?...&... scenario (just adding \&foo after one of the
openssl patches) and with this patch it works correctly both for normal and
PRIMARY downloads. [1]

 As to #fragment, that's very simple: it can never occur because it's
interpreted by the client and doesn't get sent to the server. If a # ends up in
a URL somehow, it already has to be escaped anyway or it won't get passed by wget.

 Note: I haven't tested a PRIMARY that has scp or file download method, but
those are not affected by this patch anyway.

 Regards,
 Arnout

[1]  Well, actually it doesn't work at all, our download handling fails
dramatically when there is a & in the URL. I tried fixing it by adding quoting
here and there, but it's not exactly trivial. Since we don't have URLs with &
and I don't expect any, I'm not continuing work on this. But at least using our
dl-wrapper works for a URL like ...%3F...&...
Arnout Vandecappelle Oct. 23, 2016, 3:23 p.m. UTC | #5
On 23-10-16 14:54, Thomas Petazzoni wrote:
> Hello,
> 
> On Fri, 21 Oct 2016 22:27:15 +0200, Arnout Vandecappelle
> (Essensium/Mind) wrote:
>> Some packages download files (especially patches) with a ? in the
>> URL. The ? marks the query part of the URL. However, the downloaded
>> file still contains the ? but from then on it doesn't designate a
>> query part anymore. Therefore, when fetching from PRIMARY or
>> SECONDARY site over http, the server will report a 404 Not Found.
>>
>> To fix, we need to replace the ? with %3F. Obviously, this should
>> be done only when fetching from PRIMARY or SECONDARY. For fetching
>> from the real upstream, the ? really does designate the query part.
>>
>> Fixes #9371.
>>
>> Reported-by: Johan Derycke <johanderycke@gmail.com>
>> Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
>> ---
>> I tested with
>> make BR2_PRIMARY_SITE=http://localhost/dl BR2_PRIMARY_SITE_ONLY=y BR2_DL_DIR=/tmp/dl openssl
>> and also checked with a file:// primary site, with secondary site, and
>> openssl-source-check
>> ---
>>  package/pkg-download.mk | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> Applied to master, after s/SECONDARY/BACKUP/.
> 
> Peter can you take the appropriate actions in sources.b.o ?

 There's nothing to do AFAICS. I did test a download of the openssl patches from
sources.buildroot.net and that worked.

 Regards,
 Arnout
Peter Korsgaard Oct. 23, 2016, 6:31 p.m. UTC | #6
>>>>> "Thomas" == Thomas Petazzoni <thomas.petazzoni@free-electrons.com> writes:

Hi,

 >> package/pkg-download.mk | 8 ++++++--
 >> 1 file changed, 6 insertions(+), 2 deletions(-)

 > Applied to master, after s/SECONDARY/BACKUP/.

 > Peter can you take the appropriate actions in sources.b.o ?

Yes. From a quick look it seems I copied files instead of renaming them,
so everything should be good (I'll leave the copies for older BR
versions).
diff mbox

Patch

diff --git a/package/pkg-download.mk b/package/pkg-download.mk
index 0f542e6..65e753c 100644
--- a/package/pkg-download.mk
+++ b/package/pkg-download.mk
@@ -210,6 +210,10 @@  endef
 #
 # E.G. use like this:
 # $(call DOWNLOAD,$(FOO_SITE))
+#
+# For PRIMARY and SECONDARY site, any ? in the URL is replaced by %3F. A ? in
+# the URL is used to separate query arguments, but the PRIMARY and SECONDARY
+# sites serve just plain files.
 ################################################################################
 
 define DOWNLOAD
@@ -226,7 +230,7 @@  define DOWNLOAD_INNER
 		case "$(call geturischeme,$(BR2_PRIMARY_SITE))" in \
 			file) $(call $(3)_LOCALFILES,$(BR2_PRIMARY_SITE)/$(2),$(2)) && exit ;; \
 			scp) $(call $(3)_SCP,$(BR2_PRIMARY_SITE)/$(2),$(2)) && exit ;; \
-			*) $(call $(3)_WGET,$(BR2_PRIMARY_SITE)/$(2),$(2)) && exit ;; \
+			*) $(call $(3)_WGET,$(BR2_PRIMARY_SITE)/$(subst ?,%3F,$(2)),$(2)) && exit ;; \
 		esac ; \
 	fi ; \
 	if test "$(BR2_PRIMARY_SITE_ONLY)" = "y" ; then \
@@ -245,7 +249,7 @@  define DOWNLOAD_INNER
 		esac ; \
 	fi ; \
 	if test -n "$(call qstrip,$(BR2_BACKUP_SITE))" ; then \
-		$(call $(3)_WGET,$(BR2_BACKUP_SITE)/$(2),$(2)) && exit ; \
+		$(call $(3)_WGET,$(BR2_BACKUP_SITE)/$(subst ?,%3F,$(2)),$(2)) && exit ; \
 	fi ; \
 	exit 1
 endef