diff mbox series

[v2,1/2] Implement list filtering

Message ID 20180214133429.18848-1-vkabatov@redhat.com
State Accepted
Headers show
Series [v2,1/2] Implement list filtering | expand

Commit Message

Veronika Kabatova Feb. 14, 2018, 1:34 p.m. UTC
From: Veronika Kabatova <vkabatov@redhat.com>

Sometimes, multiple projects reside at the same mailing list. So far,
Patchwork only allowed a single project per mailing list, which made it
impossible for these projects to use Patchwork (unless they did some
dirty hacks).

Add a new property `subject_match` to projects and implement filtering
on (list_id, subject_match) match instead of solely list_id. Instance
admin can specify a regex on a per-project basis when the project is
created.

Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
---
Changes since v1:
- allow default project (with empty subject match), have help string
  reflect that
- add testcase (patch 2/2)
- --list-id option now respects subject matching too -- unify
  find_project_by_id() and find_project_by_header() functions
---
 docs/api.yaml                                      |  3 ++
 patchwork/api/project.py                           |  5 +--
 .../0022_add_subject_match_to_project.py           | 29 +++++++++++++++
 patchwork/models.py                                | 10 +++++-
 patchwork/parser.py                                | 41 ++++++++++++++--------
 patchwork/tests/test_parser.py                     | 14 ++++----
 .../notes/list-filtering-4643d98b4064367a.yaml     | 11 ++++++
 7 files changed, 88 insertions(+), 25 deletions(-)
 create mode 100644 patchwork/migrations/0022_add_subject_match_to_project.py
 create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml

Comments

Stephen Finucane Feb. 27, 2018, 12:07 p.m. UTC | #1
On Wed, 2018-02-14 at 14:34 +0100, vkabatov@redhat.com wrote:
> From: Veronika Kabatova <vkabatov@redhat.com>
> 
> Sometimes, multiple projects reside at the same mailing list. So far,
> Patchwork only allowed a single project per mailing list, which made it
> impossible for these projects to use Patchwork (unless they did some
> dirty hacks).
> 
> Add a new property `subject_match` to projects and implement filtering
> on (list_id, subject_match) match instead of solely list_id. Instance
> admin can specify a regex on a per-project basis when the project is
> created.
> 
> Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>

Both of these make perfect sense to me. I'm still not sure about using
a regex, due to the security implications of same, but this is an
admin-only interface so meh.

Reviewed-by: Stephen Finucane <stephen@that.guru>

and applied.

> ---
> Changes since v1:
> - allow default project (with empty subject match), have help string
>   reflect that
> - add testcase (patch 2/2)
> - --list-id option now respects subject matching too -- unify
>   find_project_by_id() and find_project_by_header() functions
> ---
>  docs/api.yaml                                      |  3 ++
>  patchwork/api/project.py                           |  5 +--
>  .../0022_add_subject_match_to_project.py           | 29 +++++++++++++++
>  patchwork/models.py                                | 10 +++++-
>  patchwork/parser.py                                | 41 ++++++++++++++--------
>  patchwork/tests/test_parser.py                     | 14 ++++----
>  .../notes/list-filtering-4643d98b4064367a.yaml     | 11 ++++++
>  7 files changed, 88 insertions(+), 25 deletions(-)
>  create mode 100644 patchwork/migrations/0022_add_subject_match_to_project.py
>  create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> 
> diff --git a/docs/api.yaml b/docs/api.yaml
> index 3e79f0b..3373226 100644
> --- a/docs/api.yaml
> +++ b/docs/api.yaml
> @@ -374,6 +374,9 @@ definitions:
>        list_id:
>          type: string
>          description: Mailing list identifier for project.
> +      subject_match:
> +        type: string
> +        description: Regex used for email filtering.
>        list_email:
>          type: string
>          description: Mailing list email address for project.
> diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> index 446c473..597f605 100644
> --- a/patchwork/api/project.py
> +++ b/patchwork/api/project.py
> @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
>      class Meta:
>          model = Project
>          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> -        read_only_fields = ('name', 'maintainers')
> +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> +                  'subject_match')
> +        read_only_fields = ('name', 'maintainers', 'subject_match')
>          extra_kwargs = {
>              'url': {'view_name': 'api-project-detail'},
>          }
> diff --git a/patchwork/migrations/0022_add_subject_match_to_project.py b/patchwork/migrations/0022_add_subject_match_to_project.py
> new file mode 100644
> index 0000000..cef3fb6
> --- /dev/null
> +++ b/patchwork/migrations/0022_add_subject_match_to_project.py
> @@ -0,0 +1,29 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.10.8 on 2018-01-19 18:16
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +
> +
> +class Migration(migrations.Migration):
> +
> +    dependencies = [
> +        ('patchwork', '0021_django_1_10_fixes'),
> +    ]
> +
> +    operations = [
> +        migrations.AddField(
> +            model_name='project',
> +            name='subject_match',
> +            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen; empty field serves as a default for every email which has no other match.', max_length=64),
> +        ),
> +        migrations.AlterField(
> +            model_name='project',
> +            name='listid',
> +            field=models.CharField(max_length=255),
> +        ),
> +        migrations.AlterUniqueTogether(
> +            name='project',
> +            unique_together=set([('listid', 'subject_match')]),
> +        ),
> +    ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index a8bb015..581dbf7 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -71,8 +71,15 @@ class Project(models.Model):
>  
>      linkname = models.CharField(max_length=255, unique=True)
>      name = models.CharField(max_length=255, unique=True)
> -    listid = models.CharField(max_length=255, unique=True)
> +    listid = models.CharField(max_length=255)
>      listemail = models.CharField(max_length=200)
> +    subject_match = models.CharField(
> +        max_length=64, blank=True, default='', help_text='Regex to match the '
> +        'subject against if only part of emails sent to the list belongs to '
> +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> +        'rules for more projects match the first one returned from DB is '
> +        'chosen; empty field serves as a default for every email which has no '
> +        'other match.')
>  
>      # url metadata
>  
> @@ -100,6 +107,7 @@ class Project(models.Model):
>          return self.name
>  
>      class Meta:
> +        unique_together = (('listid', 'subject_match'),)
>          ordering = ['linkname']
>  
>  
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 2cabb3c..9935397 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -150,17 +150,30 @@ def clean_header(header):
>      return normalise_space(header_str)
>  
>  
> -def find_project_by_id(list_id):
> -    """Find a `project` object with given `list_id`."""
> -    project = None
> -    try:
> -        project = Project.objects.get(listid=list_id)
> -    except Project.DoesNotExist:
> -        logger.debug("'%s' if not a valid project list-id", list_id)
> -    return project
> +def find_project_by_id_and_subject(list_id, subject):
> +    """Find a `project` object based on `list_id` and subject match.
> +    Since empty `subject_match` field matches everything, project with
> +    given `list_id` and empty `subject_match` field serves as a default
> +    (in case it exists) if no other match is found.
> +    """
> +    projects = Project.objects.filter(listid=list_id)
> +    default = None
> +    for project in projects:
> +        if not project.subject_match:
> +            default = project
> +        elif re.search(project.subject_match, subject,
> +                       re.MULTILINE | re.IGNORECASE):
> +            return project
> +
> +    return default
>  
>  
> -def find_project_by_header(mail):
> +def find_project(mail, list_id=None):
> +    clean_subject = clean_header(mail.get('Subject', ''))
> +
> +    if list_id:
> +        return find_project_by_id_and_subject(list_id, clean_subject)
> +
>      project = None
>      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
>                    re.compile(r'^([\S]+)$', re.S)]
> @@ -181,12 +194,13 @@ def find_project_by_header(mail):
>  
>              listid = match.group(1)
>  
> -            project = find_project_by_id(listid)
> +            project = find_project_by_id_and_subject(listid, clean_subject)
>              if project:
>                  break
>  
>      if not project:
> -        logger.debug("Could not find a list-id in mail headers")
> +        logger.debug("Could not find a valid project for given list-id and "
> +                     "subject.")
>  
>      return project
>  
> @@ -923,10 +937,7 @@ def parse_mail(mail, list_id=None):
>          logger.debug("Ignoring email due to 'ignore' hint")
>          return
>  
> -    if list_id:
> -        project = find_project_by_id(list_id)
> -    else:
> -        project = find_project_by_header(mail)
> +    project = find_project(mail, list_id)
>  
>      if project is None:
>          logger.error('Failed to find a project for email')
> diff --git a/patchwork/tests/test_parser.py b/patchwork/tests/test_parser.py
> index 20d70af..abe11ad 100644
> --- a/patchwork/tests/test_parser.py
> +++ b/patchwork/tests/test_parser.py
> @@ -36,7 +36,7 @@ from patchwork.models import State
>  from patchwork.parser import clean_subject
>  from patchwork.parser import find_author
>  from patchwork.parser import find_patch_content as find_content
> -from patchwork.parser import find_project_by_header
> +from patchwork.parser import find_project
>  from patchwork.parser import find_series
>  from patchwork.parser import parse_mail as _parse_mail
>  from patchwork.parser import parse_pull_request
> @@ -496,25 +496,25 @@ class ListIdHeaderTest(TestCase):
>  
>      def test_no_list_id(self):
>          email = MIMEText('')
> -        project = find_project_by_header(email)
> +        project = find_project(email)
>          self.assertEqual(project, None)
>  
>      def test_blank_list_id(self):
>          email = MIMEText('')
>          email['List-Id'] = ''
> -        project = find_project_by_header(email)
> +        project = find_project(email)
>          self.assertEqual(project, None)
>  
>      def test_whitespace_list_id(self):
>          email = MIMEText('')
>          email['List-Id'] = ' '
> -        project = find_project_by_header(email)
> +        project = find_project(email)
>          self.assertEqual(project, None)
>  
>      def test_substring_list_id(self):
>          email = MIMEText('')
>          email['List-Id'] = 'example.com'
> -        project = find_project_by_header(email)
> +        project = find_project(email)
>          self.assertEqual(project, None)
>  
>      def test_short_list_id(self):
> @@ -522,13 +522,13 @@ class ListIdHeaderTest(TestCase):
>             is only the list ID itself (without enclosing angle-brackets). """
>          email = MIMEText('')
>          email['List-Id'] = self.project.listid
> -        project = find_project_by_header(email)
> +        project = find_project(email)
>          self.assertEqual(project, self.project)
>  
>      def test_long_list_id(self):
>          email = MIMEText('')
>          email['List-Id'] = 'Test text <%s>' % self.project.listid
> -        project = find_project_by_header(email)
> +        project = find_project(email)
>          self.assertEqual(project, self.project)
>  
>  
> diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> new file mode 100644
> index 0000000..0f849a7
> --- /dev/null
> +++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> @@ -0,0 +1,11 @@
> +---
> +features:
> +  - |
> +    Allow list filtering into multiple projects (and email dropping) based on
> +    subject prefixes. Enable by specifying a regular expression which needs to
> +    be matched in the subject on a per-project basis (field `subject_match`).
> +    Project with empty `subject_match` field (and matching `list_id`) serves as
> +    a default in case of no match.
> +api:
> +  - |
> +    Expose `subject_match` in REST API.
diff mbox series

Patch

diff --git a/docs/api.yaml b/docs/api.yaml
index 3e79f0b..3373226 100644
--- a/docs/api.yaml
+++ b/docs/api.yaml
@@ -374,6 +374,9 @@  definitions:
       list_id:
         type: string
         description: Mailing list identifier for project.
+      subject_match:
+        type: string
+        description: Regex used for email filtering.
       list_email:
         type: string
         description: Mailing list email address for project.
diff --git a/patchwork/api/project.py b/patchwork/api/project.py
index 446c473..597f605 100644
--- a/patchwork/api/project.py
+++ b/patchwork/api/project.py
@@ -39,8 +39,9 @@  class ProjectSerializer(HyperlinkedModelSerializer):
     class Meta:
         model = Project
         fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
-                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
-        read_only_fields = ('name', 'maintainers')
+                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
+                  'subject_match')
+        read_only_fields = ('name', 'maintainers', 'subject_match')
         extra_kwargs = {
             'url': {'view_name': 'api-project-detail'},
         }
diff --git a/patchwork/migrations/0022_add_subject_match_to_project.py b/patchwork/migrations/0022_add_subject_match_to_project.py
new file mode 100644
index 0000000..cef3fb6
--- /dev/null
+++ b/patchwork/migrations/0022_add_subject_match_to_project.py
@@ -0,0 +1,29 @@ 
+# -*- coding: utf-8 -*-
+# Generated by Django 1.10.8 on 2018-01-19 18:16
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('patchwork', '0021_django_1_10_fixes'),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name='project',
+            name='subject_match',
+            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen; empty field serves as a default for every email which has no other match.', max_length=64),
+        ),
+        migrations.AlterField(
+            model_name='project',
+            name='listid',
+            field=models.CharField(max_length=255),
+        ),
+        migrations.AlterUniqueTogether(
+            name='project',
+            unique_together=set([('listid', 'subject_match')]),
+        ),
+    ]
diff --git a/patchwork/models.py b/patchwork/models.py
index a8bb015..581dbf7 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -71,8 +71,15 @@  class Project(models.Model):
 
     linkname = models.CharField(max_length=255, unique=True)
     name = models.CharField(max_length=255, unique=True)
-    listid = models.CharField(max_length=255, unique=True)
+    listid = models.CharField(max_length=255)
     listemail = models.CharField(max_length=200)
+    subject_match = models.CharField(
+        max_length=64, blank=True, default='', help_text='Regex to match the '
+        'subject against if only part of emails sent to the list belongs to '
+        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
+        'rules for more projects match the first one returned from DB is '
+        'chosen; empty field serves as a default for every email which has no '
+        'other match.')
 
     # url metadata
 
@@ -100,6 +107,7 @@  class Project(models.Model):
         return self.name
 
     class Meta:
+        unique_together = (('listid', 'subject_match'),)
         ordering = ['linkname']
 
 
diff --git a/patchwork/parser.py b/patchwork/parser.py
index 2cabb3c..9935397 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -150,17 +150,30 @@  def clean_header(header):
     return normalise_space(header_str)
 
 
-def find_project_by_id(list_id):
-    """Find a `project` object with given `list_id`."""
-    project = None
-    try:
-        project = Project.objects.get(listid=list_id)
-    except Project.DoesNotExist:
-        logger.debug("'%s' if not a valid project list-id", list_id)
-    return project
+def find_project_by_id_and_subject(list_id, subject):
+    """Find a `project` object based on `list_id` and subject match.
+    Since empty `subject_match` field matches everything, project with
+    given `list_id` and empty `subject_match` field serves as a default
+    (in case it exists) if no other match is found.
+    """
+    projects = Project.objects.filter(listid=list_id)
+    default = None
+    for project in projects:
+        if not project.subject_match:
+            default = project
+        elif re.search(project.subject_match, subject,
+                       re.MULTILINE | re.IGNORECASE):
+            return project
+
+    return default
 
 
-def find_project_by_header(mail):
+def find_project(mail, list_id=None):
+    clean_subject = clean_header(mail.get('Subject', ''))
+
+    if list_id:
+        return find_project_by_id_and_subject(list_id, clean_subject)
+
     project = None
     listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
                   re.compile(r'^([\S]+)$', re.S)]
@@ -181,12 +194,13 @@  def find_project_by_header(mail):
 
             listid = match.group(1)
 
-            project = find_project_by_id(listid)
+            project = find_project_by_id_and_subject(listid, clean_subject)
             if project:
                 break
 
     if not project:
-        logger.debug("Could not find a list-id in mail headers")
+        logger.debug("Could not find a valid project for given list-id and "
+                     "subject.")
 
     return project
 
@@ -923,10 +937,7 @@  def parse_mail(mail, list_id=None):
         logger.debug("Ignoring email due to 'ignore' hint")
         return
 
-    if list_id:
-        project = find_project_by_id(list_id)
-    else:
-        project = find_project_by_header(mail)
+    project = find_project(mail, list_id)
 
     if project is None:
         logger.error('Failed to find a project for email')
diff --git a/patchwork/tests/test_parser.py b/patchwork/tests/test_parser.py
index 20d70af..abe11ad 100644
--- a/patchwork/tests/test_parser.py
+++ b/patchwork/tests/test_parser.py
@@ -36,7 +36,7 @@  from patchwork.models import State
 from patchwork.parser import clean_subject
 from patchwork.parser import find_author
 from patchwork.parser import find_patch_content as find_content
-from patchwork.parser import find_project_by_header
+from patchwork.parser import find_project
 from patchwork.parser import find_series
 from patchwork.parser import parse_mail as _parse_mail
 from patchwork.parser import parse_pull_request
@@ -496,25 +496,25 @@  class ListIdHeaderTest(TestCase):
 
     def test_no_list_id(self):
         email = MIMEText('')
-        project = find_project_by_header(email)
+        project = find_project(email)
         self.assertEqual(project, None)
 
     def test_blank_list_id(self):
         email = MIMEText('')
         email['List-Id'] = ''
-        project = find_project_by_header(email)
+        project = find_project(email)
         self.assertEqual(project, None)
 
     def test_whitespace_list_id(self):
         email = MIMEText('')
         email['List-Id'] = ' '
-        project = find_project_by_header(email)
+        project = find_project(email)
         self.assertEqual(project, None)
 
     def test_substring_list_id(self):
         email = MIMEText('')
         email['List-Id'] = 'example.com'
-        project = find_project_by_header(email)
+        project = find_project(email)
         self.assertEqual(project, None)
 
     def test_short_list_id(self):
@@ -522,13 +522,13 @@  class ListIdHeaderTest(TestCase):
            is only the list ID itself (without enclosing angle-brackets). """
         email = MIMEText('')
         email['List-Id'] = self.project.listid
-        project = find_project_by_header(email)
+        project = find_project(email)
         self.assertEqual(project, self.project)
 
     def test_long_list_id(self):
         email = MIMEText('')
         email['List-Id'] = 'Test text <%s>' % self.project.listid
-        project = find_project_by_header(email)
+        project = find_project(email)
         self.assertEqual(project, self.project)
 
 
diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
new file mode 100644
index 0000000..0f849a7
--- /dev/null
+++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
@@ -0,0 +1,11 @@ 
+---
+features:
+  - |
+    Allow list filtering into multiple projects (and email dropping) based on
+    subject prefixes. Enable by specifying a regular expression which needs to
+    be matched in the subject on a per-project basis (field `subject_match`).
+    Project with empty `subject_match` field (and matching `list_id`) serves as
+    a default in case of no match.
+api:
+  - |
+    Expose `subject_match` in REST API.