Improve pull request URL matching regex
diff mbox series

Message ID 20191111222741.u77idj6ijpljvetx@chatter.i7.local
State Superseded
Headers show
Series
  • Improve pull request URL matching regex
Related show

Commit Message

Konstantin Ryabitsev Nov. 11, 2019, 10:27 p.m. UTC
Existing regex was missing several important use cases, such as:

- tag/branch info wrapping to the next line, e.g.:

----
are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/
tags/v5.4-next-soc

----
(see example: https://patchwork.kernel.org/patch/11236893/)

- tag/branch info being wrapped to the next line with a backslash, e.g.:

----
are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/ \
  tags/v5.4-next-soc

----
(no example, but I've seen this before)

The proposed change deals with these edge-cases.

Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
---
 patchwork/parser.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


base-commit: 239fbd2ca1bf140bc61fdee922944624b23c812c

Comments

Andrew Donnellan Nov. 14, 2019, 4:58 a.m. UTC | #1
On 12/11/19 9:27 am, Konstantin Ryabitsev wrote:
> Existing regex was missing several important use cases, such as:
> 
> - tag/branch info wrapping to the next line, e.g.:
> 
> ----
> are available in the Git repository at:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/
> tags/v5.4-next-soc
> 
> ----
> (see example: https://patchwork.kernel.org/patch/11236893/)
> 
> - tag/branch info being wrapped to the next line with a backslash, e.g.:
> 
> ----
> are available in the Git repository at:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/ \
>    tags/v5.4-next-soc
> 
> ----
> (no example, but I've seen this before)
> 
> The proposed change deals with these edge-cases.
> 
> Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>

This needs a test :) Should be as simple as adding the examples you link 
to in patchwork/tests/mail, and then adding a couple of one-line test 
cases in PatchParseTest in patchwork/tests/test_parser.py.

> ---
>   patchwork/parser.py | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index c794f09..d25c0df 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -939,11 +939,11 @@ def parse_patch(content):
>   def parse_pull_request(content):
>       git_re = re.compile(r'^The following changes since commit.*'
>                           r'^are available in the git repository at:\n'
> -                        r'^\s*([\S]+://[^\n]+)$',
> +                        r'^\s*([\w+-]+(?:://|@)[\w/.@:~-]+[\s\\]*[\w/._-]*)\s*$',
>                           re.DOTALL | re.MULTILINE | re.IGNORECASE)
>       match = git_re.search(content)
>       if match:
> -        return match.group(1)
> +        return re.sub('\s+', ' ', match.group(1)).strip()
>       return None
>   
>   
> 
> base-commit: 239fbd2ca1bf140bc61fdee922944624b23c812c
>

Patch
diff mbox series

diff --git a/patchwork/parser.py b/patchwork/parser.py
index c794f09..d25c0df 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -939,11 +939,11 @@  def parse_patch(content):
 def parse_pull_request(content):
     git_re = re.compile(r'^The following changes since commit.*'
                         r'^are available in the git repository at:\n'
-                        r'^\s*([\S]+://[^\n]+)$',
+                        r'^\s*([\w+-]+(?:://|@)[\w/.@:~-]+[\s\\]*[\w/._-]*)\s*$',
                         re.DOTALL | re.MULTILINE | re.IGNORECASE)
     match = git_re.search(content)
     if match:
-        return match.group(1)
+        return re.sub('\s+', ' ', match.group(1)).strip()
     return None