diff mbox

[kteam-tools] ktl: Don't treat all strings matching "#[0-9]+" as a bug numbers

Message ID 1470756370-136415-1-git-send-email-seth.forshee@canonical.com
State New
Headers show

Commit Message

Seth Forshee Aug. 9, 2016, 3:26 p.m. UTC
The bug number regular expression matching correctly identifies
lines containing launchpad bug numbers, but once such a line is
identified it treats any string matching "#[0-9]+" as a bug
number. For example, in this changelog entry #1 is identified as
a bug number:

  * [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad area,
    sig: 11 [#1] while executing Froze PE Error injection (LP: #1603449)

Rather than matching the "LP: #NNNNNN" string and the bug number
separately, they can be combined into one regex which matches
against the full string but returns only the bug number portion
of the string. Since re.findall will return an empty list if
there are no matches, all that is needed is to append the list
returned by findall with this regex to the bug list for each line
of the changelog.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
---
 ktl/debian.py | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Comments

Stefan Bader Aug. 9, 2016, 4:08 p.m. UTC | #1
Is working for me...
Kamal Mostafa Aug. 10, 2016, 2:34 p.m. UTC | #2
On Tue, Aug 09, 2016 at 10:26:10AM -0500, Seth Forshee wrote:
> The bug number regular expression matching correctly identifies
> lines containing launchpad bug numbers, but once such a line is
> identified it treats any string matching "#[0-9]+" as a bug
> number. For example, in this changelog entry #1 is identified as
> a bug number:
> 
>   * [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad area,
>     sig: 11 [#1] while executing Froze PE Error injection (LP: #1603449)
> 
> Rather than matching the "LP: #NNNNNN" string and the bug number
> separately, they can be combined into one regex which matches
> against the full string but returns only the bug number portion
> of the string.

I believe the current implementation is designed to work wth changelog
entries like the following, which would be broken by your patch:

linux/xenial$  egrep 'LP: #[0-9].*#' debian.master/changelog
    - LP: #1395877, #1410480
    - LP: #1395877, #1410480
    - LP: #1310512, #1320070
    - LP: #1085766, #462111
    - LP: #1017717, #225
    - LP: #978038, #987371
    - LP: #915941, #918212
    - LP: #915941, #918212
    - LP: #907377, #911236
    - LP: #737388, #782389, #794642
    [...]

How about implementing it like this?:

First trim away everything up to and including the first "LP:" from
the line buffer, then parse every remaining instance of '#([0-9]+)' and
consider each to be an LP bug number.

 -Kamal


> Since re.findall will return an empty list if
> there are no matches, all that is needed is to append the list
> returned by findall with this regex to the bug list for each line
> of the changelog.
> 
> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> ---
>  ktl/debian.py | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/ktl/debian.py b/ktl/debian.py
> index dd2703a..76435aa 100644
> --- a/ktl/debian.py
> +++ b/ktl/debian.py
> @@ -30,8 +30,7 @@ class Debian:
>  
>      package_rc = compile("^(linux[-\S])*.*$")
>      ver_rc     = compile("^linux[-\S]* \(([0-9]+\.[0-9]+\.[0-9]+[-\.][0-9]+\.[0-9]+[~a-z0-9]*)\).*$")
> -    bug_line_rc = compile("LP:\s#[0-9]+")
> -    bug_rc = compile("#([0-9]+)")
> +    bug_rc = compile("LP:\s#([0-9]+)")
>  
>      # debian_directories
>      #
> @@ -176,11 +175,8 @@ class Debian:
>                  bugs = []
>              else:
>                  # find bug numbers and append them to the list
> -                bug_line_matches = cls.bug_line_rc.search(line)
> -                if bug_line_matches:
> -                    bug_matches = findall(cls.bug_rc,line)
> -                    if bug_matches:
> -                        bugs.extend( bug_matches )
> +                bug_matches = findall(cls.bug_rc,line)
> +                bugs.extend(bug_matches)
>  
>                  content.append(line)
>  
> -- 
> 2.7.4
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Seth Forshee Aug. 10, 2016, 3:30 p.m. UTC | #3
On Wed, Aug 10, 2016 at 07:34:51AM -0700, Kamal Mostafa wrote:
> On Tue, Aug 09, 2016 at 10:26:10AM -0500, Seth Forshee wrote:
> > The bug number regular expression matching correctly identifies
> > lines containing launchpad bug numbers, but once such a line is
> > identified it treats any string matching "#[0-9]+" as a bug
> > number. For example, in this changelog entry #1 is identified as
> > a bug number:
> > 
> >   * [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad area,
> >     sig: 11 [#1] while executing Froze PE Error injection (LP: #1603449)
> > 
> > Rather than matching the "LP: #NNNNNN" string and the bug number
> > separately, they can be combined into one regex which matches
> > against the full string but returns only the bug number portion
> > of the string.
> 
> I believe the current implementation is designed to work wth changelog
> entries like the following, which would be broken by your patch:
> 
> linux/xenial$  egrep 'LP: #[0-9].*#' debian.master/changelog
>     - LP: #1395877, #1410480
>     - LP: #1395877, #1410480
>     - LP: #1310512, #1320070
>     - LP: #1085766, #462111
>     - LP: #1017717, #225
>     - LP: #978038, #987371
>     - LP: #915941, #918212
>     - LP: #915941, #918212
>     - LP: #907377, #911236
>     - LP: #737388, #782389, #794642
>     [...]
> 
> How about implementing it like this?:
> 
> First trim away everything up to and including the first "LP:" from
> the line buffer, then parse every remaining instance of '#([0-9]+)' and
> consider each to be an LP bug number.

Okay, I didn't realize they could be structured that way. In that case
your suggestion makes more sense. Hopefully it's not valid for free form
text to follow a "LP: #NNNNNNN" tag?

Thanks,
Seth
Kamal Mostafa Aug. 10, 2016, 3:38 p.m. UTC | #4
On Wed, Aug 10, 2016 at 10:30:39AM -0500, Seth Forshee wrote:
> On Wed, Aug 10, 2016 at 07:34:51AM -0700, Kamal Mostafa wrote:
> > On Tue, Aug 09, 2016 at 10:26:10AM -0500, Seth Forshee wrote:
> > > The bug number regular expression matching correctly identifies
> > > lines containing launchpad bug numbers, but once such a line is
> > > identified it treats any string matching "#[0-9]+" as a bug
> > > number. For example, in this changelog entry #1 is identified as
> > > a bug number:
> > > 
> > >   * [LTCTest][Opal][OP820] Machine crashed with Oops: Kernel access of bad area,
> > >     sig: 11 [#1] while executing Froze PE Error injection (LP: #1603449)
> > > 
> > > Rather than matching the "LP: #NNNNNN" string and the bug number
> > > separately, they can be combined into one regex which matches
> > > against the full string but returns only the bug number portion
> > > of the string.
> > 
> > I believe the current implementation is designed to work wth changelog
> > entries like the following, which would be broken by your patch:
> > 
> > linux/xenial$  egrep 'LP: #[0-9].*#' debian.master/changelog
> >     - LP: #1395877, #1410480
> >     - LP: #1395877, #1410480
> >     - LP: #1310512, #1320070
> >     - LP: #1085766, #462111
> >     - LP: #1017717, #225
> >     - LP: #978038, #987371
> >     - LP: #915941, #918212
> >     - LP: #915941, #918212
> >     - LP: #907377, #911236
> >     - LP: #737388, #782389, #794642
> >     [...]
> > 
> > How about implementing it like this?:
> > 
> > First trim away everything up to and including the first "LP:" from
> > the line buffer, then parse every remaining instance of '#([0-9]+)' and
> > consider each to be an LP bug number.
> 
> Okay, I didn't realize they could be structured that way. In that case
> your suggestion makes more sense. Hopefully it's not valid for free form
> text to follow a "LP: #NNNNNNN" tag?

I don't find any instances of that (extraneous free form text) in our
changelogs -- just lines of the form:

    - LP: #737388, #782389, #794642

 -Kamal
diff mbox

Patch

diff --git a/ktl/debian.py b/ktl/debian.py
index dd2703a..76435aa 100644
--- a/ktl/debian.py
+++ b/ktl/debian.py
@@ -30,8 +30,7 @@  class Debian:
 
     package_rc = compile("^(linux[-\S])*.*$")
     ver_rc     = compile("^linux[-\S]* \(([0-9]+\.[0-9]+\.[0-9]+[-\.][0-9]+\.[0-9]+[~a-z0-9]*)\).*$")
-    bug_line_rc = compile("LP:\s#[0-9]+")
-    bug_rc = compile("#([0-9]+)")
+    bug_rc = compile("LP:\s#([0-9]+)")
 
     # debian_directories
     #
@@ -176,11 +175,8 @@  class Debian:
                 bugs = []
             else:
                 # find bug numbers and append them to the list
-                bug_line_matches = cls.bug_line_rc.search(line)
-                if bug_line_matches:
-                    bug_matches = findall(cls.bug_rc,line)
-                    if bug_matches:
-                        bugs.extend( bug_matches )
+                bug_matches = findall(cls.bug_rc,line)
+                bugs.extend(bug_matches)
 
                 content.append(line)