diff mbox series

Implement list filtering

Message ID 20180124132727.26833-1-vkabatov@redhat.com
State Changes Requested
Headers show
Series Implement list filtering | expand

Commit Message

Veronika Kabatova Jan. 24, 2018, 1:27 p.m. UTC
From: Veronika Kabatova <vkabatov@redhat.com>

Sometimes, multiple projects reside at the same mailing list. So far,
Patchwork only allowed a single project per mailing list, which made it
impossible for these projects to use Patchwork (unless they did some
dirty hacks).

Add a new property `subject_match` to projects and implement filtering
on (list_id, subject_match) match instead of solely list_id. Instance
admin can specify a regex on a per-project basis when the project is
created.

Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
---
 docs/api.yaml                                      |  3 +++
 docs/deployment/management.rst                     |  4 +--
 patchwork/api/project.py                           |  5 ++--
 .../0021_add_subject_match_to_project.py           | 29 ++++++++++++++++++++++
 patchwork/models.py                                |  9 ++++++-
 patchwork/parser.py                                | 19 +++++++++++++-
 .../notes/list-filtering-4643d98b4064367a.yaml     | 10 ++++++++
 7 files changed, 73 insertions(+), 6 deletions(-)
 create mode 100644 patchwork/migrations/0021_add_subject_match_to_project.py
 create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml

Comments

Daniel Axtens Jan. 25, 2018, 6:57 a.m. UTC | #1
vkabatov@redhat.com writes:

> From: Veronika Kabatova <vkabatov@redhat.com>
>
> Sometimes, multiple projects reside at the same mailing list. So far,
> Patchwork only allowed a single project per mailing list, which made it
> impossible for these projects to use Patchwork (unless they did some
> dirty hacks).
>
> Add a new property `subject_match` to projects and implement filtering
> on (list_id, subject_match) match instead of solely list_id. Instance
> admin can specify a regex on a per-project basis when the project is
> created.

(I haven't thought a great deal about this, so don't take this as a nack)

Two general thoughts:

 - netdev has several projects sharing a mailing list and they seem to
   get along fine...

 - I am pretty sure Snowpatch deals with this somehow - Snowpatch does
   CI on patchwork, so they need to know what project is used. Andrew,
   Russell - how do you deal with this there?

I suppose to put a finer point on it - what is your usecase here? What
are you trying to achieve, and can we help you do that in a way that
requires smaller changes to Patchwork, and is less fragile? (For example
this is going to break if someone mis-spells the keyword you're looking
for in the subject_match.)

Regards,
Daniel

>
> Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
> ---
>  docs/api.yaml                                      |  3 +++
>  docs/deployment/management.rst                     |  4 +--
>  patchwork/api/project.py                           |  5 ++--
>  .../0021_add_subject_match_to_project.py           | 29 ++++++++++++++++++++++
>  patchwork/models.py                                |  9 ++++++-
>  patchwork/parser.py                                | 19 +++++++++++++-
>  .../notes/list-filtering-4643d98b4064367a.yaml     | 10 ++++++++
>  7 files changed, 73 insertions(+), 6 deletions(-)
>  create mode 100644 patchwork/migrations/0021_add_subject_match_to_project.py
>  create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml
>
> diff --git a/docs/api.yaml b/docs/api.yaml
> index 3e79f0b..3373226 100644
> --- a/docs/api.yaml
> +++ b/docs/api.yaml
> @@ -374,6 +374,9 @@ definitions:
>        list_id:
>          type: string
>          description: Mailing list identifier for project.
> +      subject_match:
> +        type: string
> +        description: Regex used for email filtering.
>        list_email:
>          type: string
>          description: Mailing list email address for project.
> diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> index c50b7b6..870d7ee 100644
> --- a/docs/deployment/management.rst
> +++ b/docs/deployment/management.rst
> @@ -63,7 +63,7 @@ due to, for example, an outage.
>  .. option:: --list-id <list-id>
>  
>     mailing list ID. If not supplied, this will be extracted from the mail
> -   headers.
> +   headers. Don't use this option if you require filtering based on subjects.
>  
>  .. option:: infile
>  
> @@ -88,7 +88,7 @@ the :ref:`deployment installation guide <deployment-parsemail>`.
>  .. option:: --list-id <list-id>
>  
>     mailing list ID. If not supplied, this will be extracted from the mail
> -   headers.
> +   headers. Don't use this option if you require filtering based on subjects.
>  
>  .. option:: infile
>  
> diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> index 446c473..597f605 100644
> --- a/patchwork/api/project.py
> +++ b/patchwork/api/project.py
> @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
>      class Meta:
>          model = Project
>          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> -        read_only_fields = ('name', 'maintainers')
> +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> +                  'subject_match')
> +        read_only_fields = ('name', 'maintainers', 'subject_match')
>          extra_kwargs = {
>              'url': {'view_name': 'api-project-detail'},
>          }
> diff --git a/patchwork/migrations/0021_add_subject_match_to_project.py b/patchwork/migrations/0021_add_subject_match_to_project.py
> new file mode 100644
> index 0000000..6066266
> --- /dev/null
> +++ b/patchwork/migrations/0021_add_subject_match_to_project.py
> @@ -0,0 +1,29 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.10.8 on 2018-01-19 18:16
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +
> +
> +class Migration(migrations.Migration):
> +
> +    dependencies = [
> +        ('patchwork', '0020_tag_show_column'),
> +    ]
> +
> +    operations = [
> +        migrations.AddField(
> +            model_name='project',
> +            name='subject_match',
> +            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen.', max_length=64),
> +        ),
> +        migrations.AlterField(
> +            model_name='project',
> +            name='listid',
> +            field=models.CharField(max_length=255),
> +        ),
> +        migrations.AlterUniqueTogether(
> +            name='project',
> +            unique_together=set([('listid', 'subject_match')]),
> +        ),
> +    ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index 11886f1..907707f 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -71,8 +71,14 @@ class Project(models.Model):
>  
>      linkname = models.CharField(max_length=255, unique=True)
>      name = models.CharField(max_length=255, unique=True)
> -    listid = models.CharField(max_length=255, unique=True)
> +    listid = models.CharField(max_length=255)
>      listemail = models.CharField(max_length=200)
> +    subject_match = models.CharField(
> +        max_length=64, blank=True, default='', help_text='Regex to match the '
> +        'subject against if only part of emails sent to the list belongs to '
> +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> +        'rules for more projects match the first one returned from DB is '
> +        'chosen.')
>  
>      # url metadata
>  
> @@ -100,6 +106,7 @@ class Project(models.Model):
>          return self.name
>  
>      class Meta:
> +        unique_together = (('listid', 'subject_match'),)
>          ordering = ['linkname']
>  
>  
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index ac7dc5f..015f709 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
>          project = Project.objects.get(listid=list_id)
>      except Project.DoesNotExist:
>          logger.debug("'%s' if not a valid project list-id", list_id)
> +    except Project.MultipleObjectsReturned:
> +        logger.debug("Multiple projects with list-id '%s' found", list_id)
>      return project
>  
>  
> +def find_project_by_id_and_subject(list_id, subject):
> +    """Find a `project` object based on `list_id` and subject prefix match."""
> +    projects = Project.objects.filter(listid=list_id)
> +    for project in projects:
> +        if (not project.subject_match or
> +                re.search(project.subject_match, subject,
> +                          re.MULTILINE | re.IGNORECASE)):
> +            return project
> +
> +    logger.debug("No project to match email with list-id '%s', subject '%s' "
> +                 "found", list_id, subject)
> +    return None
> +
> +
>  def find_project_by_header(mail):
>      project = None
>      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> @@ -181,7 +197,8 @@ def find_project_by_header(mail):
>  
>              listid = match.group(1)
>  
> -            project = find_project_by_id(listid)
> +            project = find_project_by_id_and_subject(listid,
> +                                                     mail.get('Subject'))
>              if project:
>                  break
>  
> diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> new file mode 100644
> index 0000000..21c1680
> --- /dev/null
> +++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> @@ -0,0 +1,10 @@
> +---
> +features:
> +  - |
> +    Allow list filtering into multiple projects (and email dropping) based on
> +    subject prefixes. Disabled by default, enable by specifying a regular
> +    expression which needs to be matched in the subject on a per-project basis
> +    (field `subject_match`).
> +api:
> +  - |
> +    Expose `subject_match` in REST API.
> -- 
> 2.13.6
>
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Don Zickus Jan. 29, 2018, 11:09 a.m. UTC | #2
On Thu, Jan 25, 2018 at 05:57:47PM +1100, Daniel Axtens wrote:
> vkabatov@redhat.com writes:
> 
> > From: Veronika Kabatova <vkabatov@redhat.com>
> >
> > Sometimes, multiple projects reside at the same mailing list. So far,
> > Patchwork only allowed a single project per mailing list, which made it
> > impossible for these projects to use Patchwork (unless they did some
> > dirty hacks).
> >
> > Add a new property `subject_match` to projects and implement filtering
> > on (list_id, subject_match) match instead of solely list_id. Instance
> > admin can specify a regex on a per-project basis when the project is
> > created.
> 
 Hi Daniel,

Thanks for the reply.

> (I haven't thought a great deal about this, so don't take this as a nack)
> 
> Two general thoughts:
> 
>  - netdev has several projects sharing a mailing list and they seem to
>    get along fine...
> 
>  - I am pretty sure Snowpatch deals with this somehow - Snowpatch does
>    CI on patchwork, so they need to know what project is used. Andrew,
>    Russell - how do you deal with this there?

I am not familiar with how they do their parsing.

> 
> I suppose to put a finer point on it - what is your usecase here? What
> are you trying to achieve, and can we help you do that in a way that
> requires smaller changes to Patchwork, and is less fragile? (For example
> this is going to break if someone mis-spells the keyword you're looking
> for in the subject_match.)

So here is our use case.  Internally at Red Hat we use one mailing list to
post all of our kernel patches.  However, being a distro company, patches
can be applied to either RHEL-6,7,X.  For the last 15 years we have always
used the method:

[RHEL-7 PATCH] instead of [PATCH].

The project inside the []s has been what we filter through our regex.  Is it
prone to human typos?  Yes.  Most folks have stuck this in the .git/config
subjectprefix option.  That limited the typos. It isn't perfect.

We have been using a hacked up PatchWork1 for the last 7 or so years.  This
is one of those features we need (or something to solve our problem) if we
want to migrate to a PatchWork2 instance.

I hope that adds a little bit of context on our thinking.  Thoughts?

Cheers,
Don

> 
> Regards,
> Daniel
> 
> >
> > Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
> > ---
> >  docs/api.yaml                                      |  3 +++
> >  docs/deployment/management.rst                     |  4 +--
> >  patchwork/api/project.py                           |  5 ++--
> >  .../0021_add_subject_match_to_project.py           | 29 ++++++++++++++++++++++
> >  patchwork/models.py                                |  9 ++++++-
> >  patchwork/parser.py                                | 19 +++++++++++++-
> >  .../notes/list-filtering-4643d98b4064367a.yaml     | 10 ++++++++
> >  7 files changed, 73 insertions(+), 6 deletions(-)
> >  create mode 100644 patchwork/migrations/0021_add_subject_match_to_project.py
> >  create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> >
> > diff --git a/docs/api.yaml b/docs/api.yaml
> > index 3e79f0b..3373226 100644
> > --- a/docs/api.yaml
> > +++ b/docs/api.yaml
> > @@ -374,6 +374,9 @@ definitions:
> >        list_id:
> >          type: string
> >          description: Mailing list identifier for project.
> > +      subject_match:
> > +        type: string
> > +        description: Regex used for email filtering.
> >        list_email:
> >          type: string
> >          description: Mailing list email address for project.
> > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> > index c50b7b6..870d7ee 100644
> > --- a/docs/deployment/management.rst
> > +++ b/docs/deployment/management.rst
> > @@ -63,7 +63,7 @@ due to, for example, an outage.
> >  .. option:: --list-id <list-id>
> >  
> >     mailing list ID. If not supplied, this will be extracted from the mail
> > -   headers.
> > +   headers. Don't use this option if you require filtering based on subjects.
> >  
> >  .. option:: infile
> >  
> > @@ -88,7 +88,7 @@ the :ref:`deployment installation guide <deployment-parsemail>`.
> >  .. option:: --list-id <list-id>
> >  
> >     mailing list ID. If not supplied, this will be extracted from the mail
> > -   headers.
> > +   headers. Don't use this option if you require filtering based on subjects.
> >  
> >  .. option:: infile
> >  
> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> > index 446c473..597f605 100644
> > --- a/patchwork/api/project.py
> > +++ b/patchwork/api/project.py
> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
> >      class Meta:
> >          model = Project
> >          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> > -        read_only_fields = ('name', 'maintainers')
> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> > +                  'subject_match')
> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
> >          extra_kwargs = {
> >              'url': {'view_name': 'api-project-detail'},
> >          }
> > diff --git a/patchwork/migrations/0021_add_subject_match_to_project.py b/patchwork/migrations/0021_add_subject_match_to_project.py
> > new file mode 100644
> > index 0000000..6066266
> > --- /dev/null
> > +++ b/patchwork/migrations/0021_add_subject_match_to_project.py
> > @@ -0,0 +1,29 @@
> > +# -*- coding: utf-8 -*-
> > +# Generated by Django 1.10.8 on 2018-01-19 18:16
> > +from __future__ import unicode_literals
> > +
> > +from django.db import migrations, models
> > +
> > +
> > +class Migration(migrations.Migration):
> > +
> > +    dependencies = [
> > +        ('patchwork', '0020_tag_show_column'),
> > +    ]
> > +
> > +    operations = [
> > +        migrations.AddField(
> > +            model_name='project',
> > +            name='subject_match',
> > +            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen.', max_length=64),
> > +        ),
> > +        migrations.AlterField(
> > +            model_name='project',
> > +            name='listid',
> > +            field=models.CharField(max_length=255),
> > +        ),
> > +        migrations.AlterUniqueTogether(
> > +            name='project',
> > +            unique_together=set([('listid', 'subject_match')]),
> > +        ),
> > +    ]
> > diff --git a/patchwork/models.py b/patchwork/models.py
> > index 11886f1..907707f 100644
> > --- a/patchwork/models.py
> > +++ b/patchwork/models.py
> > @@ -71,8 +71,14 @@ class Project(models.Model):
> >  
> >      linkname = models.CharField(max_length=255, unique=True)
> >      name = models.CharField(max_length=255, unique=True)
> > -    listid = models.CharField(max_length=255, unique=True)
> > +    listid = models.CharField(max_length=255)
> >      listemail = models.CharField(max_length=200)
> > +    subject_match = models.CharField(
> > +        max_length=64, blank=True, default='', help_text='Regex to match the '
> > +        'subject against if only part of emails sent to the list belongs to '
> > +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> > +        'rules for more projects match the first one returned from DB is '
> > +        'chosen.')
> >  
> >      # url metadata
> >  
> > @@ -100,6 +106,7 @@ class Project(models.Model):
> >          return self.name
> >  
> >      class Meta:
> > +        unique_together = (('listid', 'subject_match'),)
> >          ordering = ['linkname']
> >  
> >  
> > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > index ac7dc5f..015f709 100644
> > --- a/patchwork/parser.py
> > +++ b/patchwork/parser.py
> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
> >          project = Project.objects.get(listid=list_id)
> >      except Project.DoesNotExist:
> >          logger.debug("'%s' if not a valid project list-id", list_id)
> > +    except Project.MultipleObjectsReturned:
> > +        logger.debug("Multiple projects with list-id '%s' found", list_id)
> >      return project
> >  
> >  
> > +def find_project_by_id_and_subject(list_id, subject):
> > +    """Find a `project` object based on `list_id` and subject prefix match."""
> > +    projects = Project.objects.filter(listid=list_id)
> > +    for project in projects:
> > +        if (not project.subject_match or
> > +                re.search(project.subject_match, subject,
> > +                          re.MULTILINE | re.IGNORECASE)):
> > +            return project
> > +
> > +    logger.debug("No project to match email with list-id '%s', subject '%s' "
> > +                 "found", list_id, subject)
> > +    return None
> > +
> > +
> >  def find_project_by_header(mail):
> >      project = None
> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
> >  
> >              listid = match.group(1)
> >  
> > -            project = find_project_by_id(listid)
> > +            project = find_project_by_id_and_subject(listid,
> > +                                                     mail.get('Subject'))
> >              if project:
> >                  break
> >  
> > diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> > new file mode 100644
> > index 0000000..21c1680
> > --- /dev/null
> > +++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> > @@ -0,0 +1,10 @@
> > +---
> > +features:
> > +  - |
> > +    Allow list filtering into multiple projects (and email dropping) based on
> > +    subject prefixes. Disabled by default, enable by specifying a regular
> > +    expression which needs to be matched in the subject on a per-project basis
> > +    (field `subject_match`).
> > +api:
> > +  - |
> > +    Expose `subject_match` in REST API.
> > -- 
> > 2.13.6
> >
> > _______________________________________________
> > Patchwork mailing list
> > Patchwork@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/patchwork
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Daniel Axtens Jan. 29, 2018, 12:36 p.m. UTC | #3
Hi Don,

>> I suppose to put a finer point on it - what is your usecase here? What
>> are you trying to achieve, and can we help you do that in a way that
>> requires smaller changes to Patchwork, and is less fragile? (For example
>> this is going to break if someone mis-spells the keyword you're looking
>> for in the subject_match.)
>
> So here is our use case.  Internally at Red Hat we use one mailing list to
> post all of our kernel patches.  However, being a distro company, patches
> can be applied to either RHEL-6,7,X.  For the last 15 years we have always
> used the method:
>
> [RHEL-7 PATCH] instead of [PATCH].

Ah yes. I work for Canonical (although I do Patchwork in a private
capacity only) and our kernel team does something very similar.

> The project inside the []s has been what we filter through our regex.  Is it
> prone to human typos?  Yes.  Most folks have stuck this in the .git/config
> subjectprefix option.  That limited the typos. It isn't perfect.
>
> We have been using a hacked up PatchWork1 for the last 7 or so years.  This
> is one of those features we need (or something to solve our problem) if we
> want to migrate to a PatchWork2 instance.
>
> I hope that adds a little bit of context on our thinking.  Thoughts?

That's a compelling use-case and I'm happy to look at supporting it. I
will review the patch more closely in the coming days.

Regards,
Daniel

> Cheers,
> Don
>
>> 
>> Regards,
>> Daniel
>> 
>> >
>> > Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
>> > ---
>> >  docs/api.yaml                                      |  3 +++
>> >  docs/deployment/management.rst                     |  4 +--
>> >  patchwork/api/project.py                           |  5 ++--
>> >  .../0021_add_subject_match_to_project.py           | 29 ++++++++++++++++++++++
>> >  patchwork/models.py                                |  9 ++++++-
>> >  patchwork/parser.py                                | 19 +++++++++++++-
>> >  .../notes/list-filtering-4643d98b4064367a.yaml     | 10 ++++++++
>> >  7 files changed, 73 insertions(+), 6 deletions(-)
>> >  create mode 100644 patchwork/migrations/0021_add_subject_match_to_project.py
>> >  create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml
>> >
>> > diff --git a/docs/api.yaml b/docs/api.yaml
>> > index 3e79f0b..3373226 100644
>> > --- a/docs/api.yaml
>> > +++ b/docs/api.yaml
>> > @@ -374,6 +374,9 @@ definitions:
>> >        list_id:
>> >          type: string
>> >          description: Mailing list identifier for project.
>> > +      subject_match:
>> > +        type: string
>> > +        description: Regex used for email filtering.
>> >        list_email:
>> >          type: string
>> >          description: Mailing list email address for project.
>> > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
>> > index c50b7b6..870d7ee 100644
>> > --- a/docs/deployment/management.rst
>> > +++ b/docs/deployment/management.rst
>> > @@ -63,7 +63,7 @@ due to, for example, an outage.
>> >  .. option:: --list-id <list-id>
>> >  
>> >     mailing list ID. If not supplied, this will be extracted from the mail
>> > -   headers.
>> > +   headers. Don't use this option if you require filtering based on subjects.
>> >  
>> >  .. option:: infile
>> >  
>> > @@ -88,7 +88,7 @@ the :ref:`deployment installation guide <deployment-parsemail>`.
>> >  .. option:: --list-id <list-id>
>> >  
>> >     mailing list ID. If not supplied, this will be extracted from the mail
>> > -   headers.
>> > +   headers. Don't use this option if you require filtering based on subjects.
>> >  
>> >  .. option:: infile
>> >  
>> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
>> > index 446c473..597f605 100644
>> > --- a/patchwork/api/project.py
>> > +++ b/patchwork/api/project.py
>> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
>> >      class Meta:
>> >          model = Project
>> >          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
>> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
>> > -        read_only_fields = ('name', 'maintainers')
>> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
>> > +                  'subject_match')
>> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
>> >          extra_kwargs = {
>> >              'url': {'view_name': 'api-project-detail'},
>> >          }
>> > diff --git a/patchwork/migrations/0021_add_subject_match_to_project.py b/patchwork/migrations/0021_add_subject_match_to_project.py
>> > new file mode 100644
>> > index 0000000..6066266
>> > --- /dev/null
>> > +++ b/patchwork/migrations/0021_add_subject_match_to_project.py
>> > @@ -0,0 +1,29 @@
>> > +# -*- coding: utf-8 -*-
>> > +# Generated by Django 1.10.8 on 2018-01-19 18:16
>> > +from __future__ import unicode_literals
>> > +
>> > +from django.db import migrations, models
>> > +
>> > +
>> > +class Migration(migrations.Migration):
>> > +
>> > +    dependencies = [
>> > +        ('patchwork', '0020_tag_show_column'),
>> > +    ]
>> > +
>> > +    operations = [
>> > +        migrations.AddField(
>> > +            model_name='project',
>> > +            name='subject_match',
>> > +            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen.', max_length=64),
>> > +        ),
>> > +        migrations.AlterField(
>> > +            model_name='project',
>> > +            name='listid',
>> > +            field=models.CharField(max_length=255),
>> > +        ),
>> > +        migrations.AlterUniqueTogether(
>> > +            name='project',
>> > +            unique_together=set([('listid', 'subject_match')]),
>> > +        ),
>> > +    ]
>> > diff --git a/patchwork/models.py b/patchwork/models.py
>> > index 11886f1..907707f 100644
>> > --- a/patchwork/models.py
>> > +++ b/patchwork/models.py
>> > @@ -71,8 +71,14 @@ class Project(models.Model):
>> >  
>> >      linkname = models.CharField(max_length=255, unique=True)
>> >      name = models.CharField(max_length=255, unique=True)
>> > -    listid = models.CharField(max_length=255, unique=True)
>> > +    listid = models.CharField(max_length=255)
>> >      listemail = models.CharField(max_length=200)
>> > +    subject_match = models.CharField(
>> > +        max_length=64, blank=True, default='', help_text='Regex to match the '
>> > +        'subject against if only part of emails sent to the list belongs to '
>> > +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
>> > +        'rules for more projects match the first one returned from DB is '
>> > +        'chosen.')
>> >  
>> >      # url metadata
>> >  
>> > @@ -100,6 +106,7 @@ class Project(models.Model):
>> >          return self.name
>> >  
>> >      class Meta:
>> > +        unique_together = (('listid', 'subject_match'),)
>> >          ordering = ['linkname']
>> >  
>> >  
>> > diff --git a/patchwork/parser.py b/patchwork/parser.py
>> > index ac7dc5f..015f709 100644
>> > --- a/patchwork/parser.py
>> > +++ b/patchwork/parser.py
>> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
>> >          project = Project.objects.get(listid=list_id)
>> >      except Project.DoesNotExist:
>> >          logger.debug("'%s' if not a valid project list-id", list_id)
>> > +    except Project.MultipleObjectsReturned:
>> > +        logger.debug("Multiple projects with list-id '%s' found", list_id)
>> >      return project
>> >  
>> >  
>> > +def find_project_by_id_and_subject(list_id, subject):
>> > +    """Find a `project` object based on `list_id` and subject prefix match."""
>> > +    projects = Project.objects.filter(listid=list_id)
>> > +    for project in projects:
>> > +        if (not project.subject_match or
>> > +                re.search(project.subject_match, subject,
>> > +                          re.MULTILINE | re.IGNORECASE)):
>> > +            return project
>> > +
>> > +    logger.debug("No project to match email with list-id '%s', subject '%s' "
>> > +                 "found", list_id, subject)
>> > +    return None
>> > +
>> > +
>> >  def find_project_by_header(mail):
>> >      project = None
>> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
>> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
>> >  
>> >              listid = match.group(1)
>> >  
>> > -            project = find_project_by_id(listid)
>> > +            project = find_project_by_id_and_subject(listid,
>> > +                                                     mail.get('Subject'))
>> >              if project:
>> >                  break
>> >  
>> > diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
>> > new file mode 100644
>> > index 0000000..21c1680
>> > --- /dev/null
>> > +++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
>> > @@ -0,0 +1,10 @@
>> > +---
>> > +features:
>> > +  - |
>> > +    Allow list filtering into multiple projects (and email dropping) based on
>> > +    subject prefixes. Disabled by default, enable by specifying a regular
>> > +    expression which needs to be matched in the subject on a per-project basis
>> > +    (field `subject_match`).
>> > +api:
>> > +  - |
>> > +    Expose `subject_match` in REST API.
>> > -- 
>> > 2.13.6
>> >
>> > _______________________________________________
>> > Patchwork mailing list
>> > Patchwork@lists.ozlabs.org
>> > https://lists.ozlabs.org/listinfo/patchwork
>> _______________________________________________
>> Patchwork mailing list
>> Patchwork@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/patchwork
Don Zickus Jan. 30, 2018, 2:34 p.m. UTC | #4
On Mon, Jan 29, 2018 at 11:36:36PM +1100, Daniel Axtens wrote:
> Hi Don,
> 
> >> I suppose to put a finer point on it - what is your usecase here? What
> >> are you trying to achieve, and can we help you do that in a way that
> >> requires smaller changes to Patchwork, and is less fragile? (For example
> >> this is going to break if someone mis-spells the keyword you're looking
> >> for in the subject_match.)
> >
> > So here is our use case.  Internally at Red Hat we use one mailing list to
> > post all of our kernel patches.  However, being a distro company, patches
> > can be applied to either RHEL-6,7,X.  For the last 15 years we have always
> > used the method:
> >
> > [RHEL-7 PATCH] instead of [PATCH].
> 
> Ah yes. I work for Canonical (although I do Patchwork in a private
> capacity only) and our kernel team does something very similar.
> 
> > The project inside the []s has been what we filter through our regex.  Is it
> > prone to human typos?  Yes.  Most folks have stuck this in the .git/config
> > subjectprefix option.  That limited the typos. It isn't perfect.
> >
> > We have been using a hacked up PatchWork1 for the last 7 or so years.  This
> > is one of those features we need (or something to solve our problem) if we
> > want to migrate to a PatchWork2 instance.
> >
> > I hope that adds a little bit of context on our thinking.  Thoughts?
> 
> That's a compelling use-case and I'm happy to look at supporting it. I
> will review the patch more closely in the coming days.

Thanks for your understanding and support!

Again, we know the solution has its 'human' limitations. :-) We just never
came up with a better idea. So any ideas there are welcomed too! :-)

Cheers,
Don

> 
> Regards,
> Daniel
> 
> > Cheers,
> > Don
> >
> >> 
> >> Regards,
> >> Daniel
> >> 
> >> >
> >> > Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
> >> > ---
> >> >  docs/api.yaml                                      |  3 +++
> >> >  docs/deployment/management.rst                     |  4 +--
> >> >  patchwork/api/project.py                           |  5 ++--
> >> >  .../0021_add_subject_match_to_project.py           | 29 ++++++++++++++++++++++
> >> >  patchwork/models.py                                |  9 ++++++-
> >> >  patchwork/parser.py                                | 19 +++++++++++++-
> >> >  .../notes/list-filtering-4643d98b4064367a.yaml     | 10 ++++++++
> >> >  7 files changed, 73 insertions(+), 6 deletions(-)
> >> >  create mode 100644 patchwork/migrations/0021_add_subject_match_to_project.py
> >> >  create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> >> >
> >> > diff --git a/docs/api.yaml b/docs/api.yaml
> >> > index 3e79f0b..3373226 100644
> >> > --- a/docs/api.yaml
> >> > +++ b/docs/api.yaml
> >> > @@ -374,6 +374,9 @@ definitions:
> >> >        list_id:
> >> >          type: string
> >> >          description: Mailing list identifier for project.
> >> > +      subject_match:
> >> > +        type: string
> >> > +        description: Regex used for email filtering.
> >> >        list_email:
> >> >          type: string
> >> >          description: Mailing list email address for project.
> >> > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> >> > index c50b7b6..870d7ee 100644
> >> > --- a/docs/deployment/management.rst
> >> > +++ b/docs/deployment/management.rst
> >> > @@ -63,7 +63,7 @@ due to, for example, an outage.
> >> >  .. option:: --list-id <list-id>
> >> >  
> >> >     mailing list ID. If not supplied, this will be extracted from the mail
> >> > -   headers.
> >> > +   headers. Don't use this option if you require filtering based on subjects.
> >> >  
> >> >  .. option:: infile
> >> >  
> >> > @@ -88,7 +88,7 @@ the :ref:`deployment installation guide <deployment-parsemail>`.
> >> >  .. option:: --list-id <list-id>
> >> >  
> >> >     mailing list ID. If not supplied, this will be extracted from the mail
> >> > -   headers.
> >> > +   headers. Don't use this option if you require filtering based on subjects.
> >> >  
> >> >  .. option:: infile
> >> >  
> >> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> >> > index 446c473..597f605 100644
> >> > --- a/patchwork/api/project.py
> >> > +++ b/patchwork/api/project.py
> >> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
> >> >      class Meta:
> >> >          model = Project
> >> >          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> >> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> >> > -        read_only_fields = ('name', 'maintainers')
> >> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> >> > +                  'subject_match')
> >> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
> >> >          extra_kwargs = {
> >> >              'url': {'view_name': 'api-project-detail'},
> >> >          }
> >> > diff --git a/patchwork/migrations/0021_add_subject_match_to_project.py b/patchwork/migrations/0021_add_subject_match_to_project.py
> >> > new file mode 100644
> >> > index 0000000..6066266
> >> > --- /dev/null
> >> > +++ b/patchwork/migrations/0021_add_subject_match_to_project.py
> >> > @@ -0,0 +1,29 @@
> >> > +# -*- coding: utf-8 -*-
> >> > +# Generated by Django 1.10.8 on 2018-01-19 18:16
> >> > +from __future__ import unicode_literals
> >> > +
> >> > +from django.db import migrations, models
> >> > +
> >> > +
> >> > +class Migration(migrations.Migration):
> >> > +
> >> > +    dependencies = [
> >> > +        ('patchwork', '0020_tag_show_column'),
> >> > +    ]
> >> > +
> >> > +    operations = [
> >> > +        migrations.AddField(
> >> > +            model_name='project',
> >> > +            name='subject_match',
> >> > +            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen.', max_length=64),
> >> > +        ),
> >> > +        migrations.AlterField(
> >> > +            model_name='project',
> >> > +            name='listid',
> >> > +            field=models.CharField(max_length=255),
> >> > +        ),
> >> > +        migrations.AlterUniqueTogether(
> >> > +            name='project',
> >> > +            unique_together=set([('listid', 'subject_match')]),
> >> > +        ),
> >> > +    ]
> >> > diff --git a/patchwork/models.py b/patchwork/models.py
> >> > index 11886f1..907707f 100644
> >> > --- a/patchwork/models.py
> >> > +++ b/patchwork/models.py
> >> > @@ -71,8 +71,14 @@ class Project(models.Model):
> >> >  
> >> >      linkname = models.CharField(max_length=255, unique=True)
> >> >      name = models.CharField(max_length=255, unique=True)
> >> > -    listid = models.CharField(max_length=255, unique=True)
> >> > +    listid = models.CharField(max_length=255)
> >> >      listemail = models.CharField(max_length=200)
> >> > +    subject_match = models.CharField(
> >> > +        max_length=64, blank=True, default='', help_text='Regex to match the '
> >> > +        'subject against if only part of emails sent to the list belongs to '
> >> > +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> >> > +        'rules for more projects match the first one returned from DB is '
> >> > +        'chosen.')
> >> >  
> >> >      # url metadata
> >> >  
> >> > @@ -100,6 +106,7 @@ class Project(models.Model):
> >> >          return self.name
> >> >  
> >> >      class Meta:
> >> > +        unique_together = (('listid', 'subject_match'),)
> >> >          ordering = ['linkname']
> >> >  
> >> >  
> >> > diff --git a/patchwork/parser.py b/patchwork/parser.py
> >> > index ac7dc5f..015f709 100644
> >> > --- a/patchwork/parser.py
> >> > +++ b/patchwork/parser.py
> >> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
> >> >          project = Project.objects.get(listid=list_id)
> >> >      except Project.DoesNotExist:
> >> >          logger.debug("'%s' if not a valid project list-id", list_id)
> >> > +    except Project.MultipleObjectsReturned:
> >> > +        logger.debug("Multiple projects with list-id '%s' found", list_id)
> >> >      return project
> >> >  
> >> >  
> >> > +def find_project_by_id_and_subject(list_id, subject):
> >> > +    """Find a `project` object based on `list_id` and subject prefix match."""
> >> > +    projects = Project.objects.filter(listid=list_id)
> >> > +    for project in projects:
> >> > +        if (not project.subject_match or
> >> > +                re.search(project.subject_match, subject,
> >> > +                          re.MULTILINE | re.IGNORECASE)):
> >> > +            return project
> >> > +
> >> > +    logger.debug("No project to match email with list-id '%s', subject '%s' "
> >> > +                 "found", list_id, subject)
> >> > +    return None
> >> > +
> >> > +
> >> >  def find_project_by_header(mail):
> >> >      project = None
> >> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> >> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
> >> >  
> >> >              listid = match.group(1)
> >> >  
> >> > -            project = find_project_by_id(listid)
> >> > +            project = find_project_by_id_and_subject(listid,
> >> > +                                                     mail.get('Subject'))
> >> >              if project:
> >> >                  break
> >> >  
> >> > diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> >> > new file mode 100644
> >> > index 0000000..21c1680
> >> > --- /dev/null
> >> > +++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
> >> > @@ -0,0 +1,10 @@
> >> > +---
> >> > +features:
> >> > +  - |
> >> > +    Allow list filtering into multiple projects (and email dropping) based on
> >> > +    subject prefixes. Disabled by default, enable by specifying a regular
> >> > +    expression which needs to be matched in the subject on a per-project basis
> >> > +    (field `subject_match`).
> >> > +api:
> >> > +  - |
> >> > +    Expose `subject_match` in REST API.
> >> > -- 
> >> > 2.13.6
> >> >
> >> > _______________________________________________
> >> > Patchwork mailing list
> >> > Patchwork@lists.ozlabs.org
> >> > https://lists.ozlabs.org/listinfo/patchwork
> >> _______________________________________________
> >> Patchwork mailing list
> >> Patchwork@lists.ozlabs.org
> >> https://lists.ozlabs.org/listinfo/patchwork
Daniel Axtens Jan. 31, 2018, 2:25 a.m. UTC | #5
Don Zickus <dzickus@redhat.com> writes:

> On Mon, Jan 29, 2018 at 11:36:36PM +1100, Daniel Axtens wrote:
>> Hi Don,
>> 
>> >> I suppose to put a finer point on it - what is your usecase here? What
>> >> are you trying to achieve, and can we help you do that in a way that
>> >> requires smaller changes to Patchwork, and is less fragile? (For example
>> >> this is going to break if someone mis-spells the keyword you're looking
>> >> for in the subject_match.)
>> >
>> > So here is our use case.  Internally at Red Hat we use one mailing list to
>> > post all of our kernel patches.  However, being a distro company, patches
>> > can be applied to either RHEL-6,7,X.  For the last 15 years we have always
>> > used the method:
>> >
>> > [RHEL-7 PATCH] instead of [PATCH].
>> 
>> Ah yes. I work for Canonical (although I do Patchwork in a private
>> capacity only) and our kernel team does something very similar.
>> 
>> > The project inside the []s has been what we filter through our regex.  Is it
>> > prone to human typos?  Yes.  Most folks have stuck this in the .git/config
>> > subjectprefix option.  That limited the typos. It isn't perfect.
>> >
>> > We have been using a hacked up PatchWork1 for the last 7 or so years.  This
>> > is one of those features we need (or something to solve our problem) if we
>> > want to migrate to a PatchWork2 instance.
>> >
>> > I hope that adds a little bit of context on our thinking.  Thoughts?
>> 
>> That's a compelling use-case and I'm happy to look at supporting it. I
>> will review the patch more closely in the coming days.
>
> Thanks for your understanding and support!
>
> Again, we know the solution has its 'human' limitations. :-) We just never
> came up with a better idea. So any ideas there are welcomed too! :-)

[I will eventually get around to reviewing the patch proper, this is just spitballing.]

One option that came to me last night would be to do this filtering
before the emails are injested into patchwork. To elaborate:

I assume you're injesting mail using something like the setup described
at
http://patchwork.readthedocs.io/en/master/deployment/installation/#mail-transfer-agent-mta
that is:

$ sudo cat << EOF > /etc/aliases
patchwork: "|/opt/patchwork/patchwork/bin/parsemail.sh"
EOF

So what you could do is pass the email to something like procmail first,
let that filter the messages on your regexes, and then pass the filtered
mail to parsemail.sh, using an explicit list-id parameter to deliver it
to the right project.

This doesn't involve patchwork at all, which is both a strength (it's
simple for me) and a weakness (it's complex for you and involves
spreading config across 2 places).

Regards,
Daniel
>
> Cheers,
> Don
>
>> 
>> Regards,
>> Daniel
>> 
>> > Cheers,
>> > Don
>> >
>> >> 
>> >> Regards,
>> >> Daniel
>> >> 
>> >> >
>> >> > Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
>> >> > ---
>> >> >  docs/api.yaml                                      |  3 +++
>> >> >  docs/deployment/management.rst                     |  4 +--
>> >> >  patchwork/api/project.py                           |  5 ++--
>> >> >  .../0021_add_subject_match_to_project.py           | 29 ++++++++++++++++++++++
>> >> >  patchwork/models.py                                |  9 ++++++-
>> >> >  patchwork/parser.py                                | 19 +++++++++++++-
>> >> >  .../notes/list-filtering-4643d98b4064367a.yaml     | 10 ++++++++
>> >> >  7 files changed, 73 insertions(+), 6 deletions(-)
>> >> >  create mode 100644 patchwork/migrations/0021_add_subject_match_to_project.py
>> >> >  create mode 100644 releasenotes/notes/list-filtering-4643d98b4064367a.yaml
>> >> >
>> >> > diff --git a/docs/api.yaml b/docs/api.yaml
>> >> > index 3e79f0b..3373226 100644
>> >> > --- a/docs/api.yaml
>> >> > +++ b/docs/api.yaml
>> >> > @@ -374,6 +374,9 @@ definitions:
>> >> >        list_id:
>> >> >          type: string
>> >> >          description: Mailing list identifier for project.
>> >> > +      subject_match:
>> >> > +        type: string
>> >> > +        description: Regex used for email filtering.
>> >> >        list_email:
>> >> >          type: string
>> >> >          description: Mailing list email address for project.
>> >> > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
>> >> > index c50b7b6..870d7ee 100644
>> >> > --- a/docs/deployment/management.rst
>> >> > +++ b/docs/deployment/management.rst
>> >> > @@ -63,7 +63,7 @@ due to, for example, an outage.
>> >> >  .. option:: --list-id <list-id>
>> >> >  
>> >> >     mailing list ID. If not supplied, this will be extracted from the mail
>> >> > -   headers.
>> >> > +   headers. Don't use this option if you require filtering based on subjects.
>> >> >  
>> >> >  .. option:: infile
>> >> >  
>> >> > @@ -88,7 +88,7 @@ the :ref:`deployment installation guide <deployment-parsemail>`.
>> >> >  .. option:: --list-id <list-id>
>> >> >  
>> >> >     mailing list ID. If not supplied, this will be extracted from the mail
>> >> > -   headers.
>> >> > +   headers. Don't use this option if you require filtering based on subjects.
>> >> >  
>> >> >  .. option:: infile
>> >> >  
>> >> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
>> >> > index 446c473..597f605 100644
>> >> > --- a/patchwork/api/project.py
>> >> > +++ b/patchwork/api/project.py
>> >> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
>> >> >      class Meta:
>> >> >          model = Project
>> >> >          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
>> >> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
>> >> > -        read_only_fields = ('name', 'maintainers')
>> >> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
>> >> > +                  'subject_match')
>> >> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
>> >> >          extra_kwargs = {
>> >> >              'url': {'view_name': 'api-project-detail'},
>> >> >          }
>> >> > diff --git a/patchwork/migrations/0021_add_subject_match_to_project.py b/patchwork/migrations/0021_add_subject_match_to_project.py
>> >> > new file mode 100644
>> >> > index 0000000..6066266
>> >> > --- /dev/null
>> >> > +++ b/patchwork/migrations/0021_add_subject_match_to_project.py
>> >> > @@ -0,0 +1,29 @@
>> >> > +# -*- coding: utf-8 -*-
>> >> > +# Generated by Django 1.10.8 on 2018-01-19 18:16
>> >> > +from __future__ import unicode_literals
>> >> > +
>> >> > +from django.db import migrations, models
>> >> > +
>> >> > +
>> >> > +class Migration(migrations.Migration):
>> >> > +
>> >> > +    dependencies = [
>> >> > +        ('patchwork', '0020_tag_show_column'),
>> >> > +    ]
>> >> > +
>> >> > +    operations = [
>> >> > +        migrations.AddField(
>> >> > +            model_name='project',
>> >> > +            name='subject_match',
>> >> > +            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen.', max_length=64),
>> >> > +        ),
>> >> > +        migrations.AlterField(
>> >> > +            model_name='project',
>> >> > +            name='listid',
>> >> > +            field=models.CharField(max_length=255),
>> >> > +        ),
>> >> > +        migrations.AlterUniqueTogether(
>> >> > +            name='project',
>> >> > +            unique_together=set([('listid', 'subject_match')]),
>> >> > +        ),
>> >> > +    ]
>> >> > diff --git a/patchwork/models.py b/patchwork/models.py
>> >> > index 11886f1..907707f 100644
>> >> > --- a/patchwork/models.py
>> >> > +++ b/patchwork/models.py
>> >> > @@ -71,8 +71,14 @@ class Project(models.Model):
>> >> >  
>> >> >      linkname = models.CharField(max_length=255, unique=True)
>> >> >      name = models.CharField(max_length=255, unique=True)
>> >> > -    listid = models.CharField(max_length=255, unique=True)
>> >> > +    listid = models.CharField(max_length=255)
>> >> >      listemail = models.CharField(max_length=200)
>> >> > +    subject_match = models.CharField(
>> >> > +        max_length=64, blank=True, default='', help_text='Regex to match the '
>> >> > +        'subject against if only part of emails sent to the list belongs to '
>> >> > +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
>> >> > +        'rules for more projects match the first one returned from DB is '
>> >> > +        'chosen.')
>> >> >  
>> >> >      # url metadata
>> >> >  
>> >> > @@ -100,6 +106,7 @@ class Project(models.Model):
>> >> >          return self.name
>> >> >  
>> >> >      class Meta:
>> >> > +        unique_together = (('listid', 'subject_match'),)
>> >> >          ordering = ['linkname']
>> >> >  
>> >> >  
>> >> > diff --git a/patchwork/parser.py b/patchwork/parser.py
>> >> > index ac7dc5f..015f709 100644
>> >> > --- a/patchwork/parser.py
>> >> > +++ b/patchwork/parser.py
>> >> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
>> >> >          project = Project.objects.get(listid=list_id)
>> >> >      except Project.DoesNotExist:
>> >> >          logger.debug("'%s' if not a valid project list-id", list_id)
>> >> > +    except Project.MultipleObjectsReturned:
>> >> > +        logger.debug("Multiple projects with list-id '%s' found", list_id)
>> >> >      return project
>> >> >  
>> >> >  
>> >> > +def find_project_by_id_and_subject(list_id, subject):
>> >> > +    """Find a `project` object based on `list_id` and subject prefix match."""
>> >> > +    projects = Project.objects.filter(listid=list_id)
>> >> > +    for project in projects:
>> >> > +        if (not project.subject_match or
>> >> > +                re.search(project.subject_match, subject,
>> >> > +                          re.MULTILINE | re.IGNORECASE)):
>> >> > +            return project
>> >> > +
>> >> > +    logger.debug("No project to match email with list-id '%s', subject '%s' "
>> >> > +                 "found", list_id, subject)
>> >> > +    return None
>> >> > +
>> >> > +
>> >> >  def find_project_by_header(mail):
>> >> >      project = None
>> >> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
>> >> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
>> >> >  
>> >> >              listid = match.group(1)
>> >> >  
>> >> > -            project = find_project_by_id(listid)
>> >> > +            project = find_project_by_id_and_subject(listid,
>> >> > +                                                     mail.get('Subject'))
>> >> >              if project:
>> >> >                  break
>> >> >  
>> >> > diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
>> >> > new file mode 100644
>> >> > index 0000000..21c1680
>> >> > --- /dev/null
>> >> > +++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
>> >> > @@ -0,0 +1,10 @@
>> >> > +---
>> >> > +features:
>> >> > +  - |
>> >> > +    Allow list filtering into multiple projects (and email dropping) based on
>> >> > +    subject prefixes. Disabled by default, enable by specifying a regular
>> >> > +    expression which needs to be matched in the subject on a per-project basis
>> >> > +    (field `subject_match`).
>> >> > +api:
>> >> > +  - |
>> >> > +    Expose `subject_match` in REST API.
>> >> > -- 
>> >> > 2.13.6
>> >> >
>> >> > _______________________________________________
>> >> > Patchwork mailing list
>> >> > Patchwork@lists.ozlabs.org
>> >> > https://lists.ozlabs.org/listinfo/patchwork
>> >> _______________________________________________
>> >> Patchwork mailing list
>> >> Patchwork@lists.ozlabs.org
>> >> https://lists.ozlabs.org/listinfo/patchwork
Stephen Finucane Jan. 31, 2018, 10:25 a.m. UTC | #6
On Wed, 2018-01-31 at 13:25 +1100, Daniel Axtens wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> > On Mon, Jan 29, 2018 at 11:36:36PM +1100, Daniel Axtens wrote:
> > > Hi Don,
> > > 
> > > > > I suppose to put a finer point on it - what is your usecase here? What
> > > > > are you trying to achieve, and can we help you do that in a way that
> > > > > requires smaller changes to Patchwork, and is less fragile? (For example
> > > > > this is going to break if someone mis-spells the keyword you're looking
> > > > > for in the subject_match.)
> > > > 
> > > > So here is our use case.  Internally at Red Hat we use one mailing list to
> > > > post all of our kernel patches.  However, being a distro company, patches
> > > > can be applied to either RHEL-6,7,X.  For the last 15 years we have always
> > > > used the method:
> > > > 
> > > > [RHEL-7 PATCH] instead of [PATCH].
> > > 
> > > Ah yes. I work for Canonical (although I do Patchwork in a private
> > > capacity only) and our kernel team does something very similar.
> > > 
> > > > The project inside the []s has been what we filter through our regex.  Is it
> > > > prone to human typos?  Yes.  Most folks have stuck this in the .git/config
> > > > subjectprefix option.  That limited the typos. It isn't perfect.
> > > > 
> > > > We have been using a hacked up PatchWork1 for the last 7 or so years.  This
> > > > is one of those features we need (or something to solve our problem) if we
> > > > want to migrate to a PatchWork2 instance.
> > > > 
> > > > I hope that adds a little bit of context on our thinking.  Thoughts?
> > > 
> > > That's a compelling use-case and I'm happy to look at supporting it. I
> > > will review the patch more closely in the coming days.
> > 
> > Thanks for your understanding and support!
> > 
> > Again, we know the solution has its 'human' limitations. :-) We just never
> > came up with a better idea. So any ideas there are welcomed too! :-)
> 
> [I will eventually get around to reviewing the patch proper, this is just spitballing.]
> 
> One option that came to me last night would be to do this filtering
> before the emails are injested into patchwork. To elaborate:
> 
> I assume you're injesting mail using something like the setup described
> at
> http://patchwork.readthedocs.io/en/master/deployment/installation/#mail-transfer-agent-mta
> that is:
> 
> $ sudo cat << EOF > /etc/aliases
> patchwork: "|/opt/patchwork/patchwork/bin/parsemail.sh"
> EOF
> 
> So what you could do is pass the email to something like procmail first,
> let that filter the messages on your regexes, and then pass the filtered
> mail to parsemail.sh, using an explicit list-id parameter to deliver it
> to the right project.

What would be the advantage of this though? Far as I can tell, there is
a clear pattern here for mailing lists that are shared by multiple
projects, namely, the use of a specific subject tag. I've seen this
pattern in use on other development mailing lists (openstack-dev jumps
to mind).

> This doesn't involve patchwork at all, which is both a strength (it's
> simple for me) and a weakness (it's complex for you and involves
> spreading config across 2 places).

To be honest, given how simple and generic this patch is, I'd prefer to
keep the logic within Patchwork. It would require much less work on an
administrators end over all (seeing as they'll have to configure
procmail too), and be far more transparent to end users.

Thoughts?

Stephen
Veronika Kabatova Jan. 31, 2018, 12:17 p.m. UTC | #7
----- Original Message -----
> From: "Stephen Finucane" <stephen@that.guru>
> To: "Daniel Axtens" <dja@axtens.net>, "Don Zickus" <dzickus@redhat.com>
> Cc: patchwork@lists.ozlabs.org
> Sent: Wednesday, January 31, 2018 11:25:03 AM
> Subject: Re: [PATCH] Implement list filtering
> 
> On Wed, 2018-01-31 at 13:25 +1100, Daniel Axtens wrote:
> > Don Zickus <dzickus@redhat.com> writes:
> > 
> > > On Mon, Jan 29, 2018 at 11:36:36PM +1100, Daniel Axtens wrote:
> > > > Hi Don,
> > > > 
> > > > > > I suppose to put a finer point on it - what is your usecase here?
> > > > > > What
> > > > > > are you trying to achieve, and can we help you do that in a way
> > > > > > that
> > > > > > requires smaller changes to Patchwork, and is less fragile? (For
> > > > > > example
> > > > > > this is going to break if someone mis-spells the keyword you're
> > > > > > looking
> > > > > > for in the subject_match.)
> > > > > 
> > > > > So here is our use case.  Internally at Red Hat we use one mailing
> > > > > list to
> > > > > post all of our kernel patches.  However, being a distro company,
> > > > > patches
> > > > > can be applied to either RHEL-6,7,X.  For the last 15 years we have
> > > > > always
> > > > > used the method:
> > > > > 
> > > > > [RHEL-7 PATCH] instead of [PATCH].
> > > > 
> > > > Ah yes. I work for Canonical (although I do Patchwork in a private
> > > > capacity only) and our kernel team does something very similar.
> > > > 
> > > > > The project inside the []s has been what we filter through our regex.
> > > > > Is it
> > > > > prone to human typos?  Yes.  Most folks have stuck this in the
> > > > > .git/config
> > > > > subjectprefix option.  That limited the typos. It isn't perfect.
> > > > > 
> > > > > We have been using a hacked up PatchWork1 for the last 7 or so years.
> > > > > This
> > > > > is one of those features we need (or something to solve our problem)
> > > > > if we
> > > > > want to migrate to a PatchWork2 instance.
> > > > > 
> > > > > I hope that adds a little bit of context on our thinking.  Thoughts?
> > > > 
> > > > That's a compelling use-case and I'm happy to look at supporting it. I
> > > > will review the patch more closely in the coming days.
> > > 
> > > Thanks for your understanding and support!
> > > 
> > > Again, we know the solution has its 'human' limitations. :-) We just
> > > never
> > > came up with a better idea. So any ideas there are welcomed too! :-)
> > 
> > [I will eventually get around to reviewing the patch proper, this is just
> > spitballing.]
> > 
> > One option that came to me last night would be to do this filtering
> > before the emails are injested into patchwork. To elaborate:
> > 
> > I assume you're injesting mail using something like the setup described
> > at
> > http://patchwork.readthedocs.io/en/master/deployment/installation/#mail-transfer-agent-mta
> > that is:
> > 
> > $ sudo cat << EOF > /etc/aliases
> > patchwork: "|/opt/patchwork/patchwork/bin/parsemail.sh"
> > EOF
> > 
> > So what you could do is pass the email to something like procmail first,
> > let that filter the messages on your regexes, and then pass the filtered
> > mail to parsemail.sh, using an explicit list-id parameter to deliver it
> > to the right project.
> 

We are using the getmail method and previously did something similar
but weren't very happy with that solution since it was harder to
manage and required direct access to the machine.

> What would be the advantage of this though? Far as I can tell, there is
> a clear pattern here for mailing lists that are shared by multiple
> projects, namely, the use of a specific subject tag. I've seen this
> pattern in use on other development mailing lists (openstack-dev jumps
> to mind).
> 
> > This doesn't involve patchwork at all, which is both a strength (it's
> > simple for me) and a weakness (it's complex for you and involves
> > spreading config across 2 places).
> 
> To be honest, given how simple and generic this patch is, I'd prefer to
> keep the logic within Patchwork. It would require much less work on an
> administrators end over all (seeing as they'll have to configure
> procmail too), and be far more transparent to end users.
> 

I agree. Our goal is to have Patchwork as a service (container or VM)
managed by DevOps or similar team. This solution would make it uselessly
hard to for example add and edit projects -- instead of having our
Patchwork admin user click a few times, we would need to contact those
people and explain to them how to change the configuration with every
change we need. And as Stephen said, it's really not a straightforward
and transparent solution.

Veronika

> Thoughts?
> 
> Stephen
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
>
Daniel Axtens Jan. 31, 2018, 8:39 p.m. UTC | #8
Hi all,

>> To be honest, given how simple and generic this patch is, I'd prefer to
>> keep the logic within Patchwork. It would require much less work on an
>> administrators end over all (seeing as they'll have to configure
>> procmail too), and be far more transparent to end users.
>> 
>
> I agree. Our goal is to have Patchwork as a service (container or VM)
> managed by DevOps or similar team. This solution would make it uselessly
> hard to for example add and edit projects -- instead of having our
> Patchwork admin user click a few times, we would need to contact those
> people and explain to them how to change the configuration with every
> change we need. And as Stephen said, it's really not a straightforward
> and transparent solution.

OK, sold on the PW solution now. Will look at the code itself shortly.

Regards,
Daniel


>
> Veronika
>
>> Thoughts?
>> 
>> Stephen
>> _______________________________________________
>> Patchwork mailing list
>> Patchwork@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/patchwork
>>
Daniel Axtens Feb. 5, 2018, 3:24 p.m. UTC | #9
Hi Veronika,

So, having been convinced of the merits of this approach, here's my code
review.

> Sometimes, multiple projects reside at the same mailing list. So far,
> Patchwork only allowed a single project per mailing list, which made it
> impossible for these projects to use Patchwork (unless they did some
> dirty hacks).
>
> Add a new property `subject_match` to projects and implement filtering
> on (list_id, subject_match) match instead of solely list_id. Instance
> admin can specify a regex on a per-project basis when the project is
> created.

Ok, so let me see if I have the logic of this patch right.

A message comes in.

A set of possible projects is found based on the list-id.

Then, starting from the first one returned by the db (with no intrinsic
ordering), the subject_match field is examined:
 - if subject_match is empty, this project is returned as a match
 - if subject_match is non-empty, the project is returned as a match if
   the subject matches
 
If this understanding is incorrect, bail out here :)

Consider a list like netdev. It has at least a couple of userspace
projects (iproute2, ethtool) that could conceivably be spun off into
their own lists, and which could use subject matches, but most patches
are for the linux kernel and do not have usable subject matches.

As I understand it, you would need to have 3 projects:
 - for iproute2, subject-match \[[^\]]*iproute[^\]]*\]
 - for ethtool, subject-match \[[^\]]*ethtool[^\]]*\]
Then how would you match the rest of the list? You could have an empty
subject-match, but then if that is loaded first out of the database, it
would match first and patches would never make it to the iproute2 or
ethtool projects.

I think you need some sort of default or fall-back or last-resort - the
name is not important: just that it will match *only* if no other
project matches.

Does that sound right?

I have some other minor comments which I have included throughout:

> diff --git a/docs/api.yaml b/docs/api.yaml
> index 3e79f0b..3373226 100644
> --- a/docs/api.yaml
> +++ b/docs/api.yaml
> @@ -374,6 +374,9 @@ definitions:
>        list_id:
>          type: string
>          description: Mailing list identifier for project.
> +      subject_match:
> +        type: string
> +        description: Regex used for email filtering.
This is good.

> diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> index c50b7b6..870d7ee 100644
> --- a/docs/deployment/management.rst
> +++ b/docs/deployment/management.rst
> @@ -63,7 +63,7 @@ due to, for example, an outage.
>  .. option:: --list-id <list-id>
>  
>     mailing list ID. If not supplied, this will be extracted from the mail
> -   headers.
> +   headers. Don't use this option if you require filtering based on subjects.
>  
>  .. option:: infile

I don't understand why this would interfere with subject
filtering. I notice below that you've added a new method that matches by
list-id and subject and kept the old list-id matching one - does the
command-line option use the old method?

I think I would prefer that there only be one system of mapping mails to
projects and for this restriction to go away, but if there's a good
reason for it to stay I am always open to persuasion.

> diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> index 446c473..597f605 100644
> --- a/patchwork/api/project.py
> +++ b/patchwork/api/project.py
> @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
>      class Meta:
>          model = Project
>          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> -        read_only_fields = ('name', 'maintainers')
> +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> +                  'subject_match')
> +        read_only_fields = ('name', 'maintainers', 'subject_match')
This looks good - albeit I am not an API expert.

> diff --git a/patchwork/models.py b/patchwork/models.py
> index 11886f1..907707f 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -71,8 +71,14 @@ class Project(models.Model):
>  
>      linkname = models.CharField(max_length=255, unique=True)
>      name = models.CharField(max_length=255, unique=True)
> -    listid = models.CharField(max_length=255, unique=True)
> +    listid = models.CharField(max_length=255)
>      listemail = models.CharField(max_length=200)
> +    subject_match = models.CharField(
> +        max_length=64, blank=True, default='', help_text='Regex to match the '
> +        'subject against if only part of emails sent to the list belongs to '
> +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> +        'rules for more projects match the first one returned from DB is '
> +        'chosen.')

I don't want to bikeshed this too much, but it definitely needs to say
that a blank string matches everything.

> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index ac7dc5f..015f709 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
>          project = Project.objects.get(listid=list_id)
>      except Project.DoesNotExist:
>          logger.debug("'%s' if not a valid project list-id", list_id)
> +    except Project.MultipleObjectsReturned:
> +        logger.debug("Multiple projects with list-id '%s' found", list_id)
>      return project
>  
>  
> +def find_project_by_id_and_subject(list_id, subject):
> +    """Find a `project` object based on `list_id` and subject prefix match."""
> +    projects = Project.objects.filter(listid=list_id)
> +    for project in projects:
> +        if (not project.subject_match or
> +                re.search(project.subject_match, subject,
> +                          re.MULTILINE | re.IGNORECASE)):
> +            return project
> +
> +    logger.debug("No project to match email with list-id '%s', subject '%s' "
> +                 "found", list_id, subject)
> +    return None
> +
> +
>  def find_project_by_header(mail):
>      project = None
>      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> @@ -181,7 +197,8 @@ def find_project_by_header(mail):
>  
>              listid = match.group(1)
>  
> -            project = find_project_by_id(listid)
> +            project = find_project_by_id_and_subject(listid,
> +                                                     mail.get('Subject'))
>              if project:
>                  break
Again here there is the 2 ways to find a project. It does look like
subject matching only happens if you don't specify --list-id and I think
that is probably not the approach I want to take.

Patchwork also extracts the bits of the subject between square brackets
- would matching on that be more appropriate? (See subject_prefixes in
parser.py.) That would mean the regex will automatically select for
things in square brackets without having to encode that in the regex,
which might avoid some user confusion and misdirected mail.

Lastly, please could you also add a simple test case for this: doesn't need to
be big, just enough to show that the feature works.

Thanks again for your contribution to Patchwork!

Regards,
Daniel
Don Zickus Feb. 5, 2018, 5:14 p.m. UTC | #10
On Tue, Feb 06, 2018 at 02:24:21AM +1100, Daniel Axtens wrote:
> Hi Veronika,
> 
> So, having been convinced of the merits of this approach, here's my code
> review.
> 
> > Sometimes, multiple projects reside at the same mailing list. So far,
> > Patchwork only allowed a single project per mailing list, which made it
> > impossible for these projects to use Patchwork (unless they did some
> > dirty hacks).
> >
> > Add a new property `subject_match` to projects and implement filtering
> > on (list_id, subject_match) match instead of solely list_id. Instance
> > admin can specify a regex on a per-project basis when the project is
> > created.
> 
> Ok, so let me see if I have the logic of this patch right.
> 
> A message comes in.
> 
> A set of possible projects is found based on the list-id.
> 
> Then, starting from the first one returned by the db (with no intrinsic
> ordering), the subject_match field is examined:
>  - if subject_match is empty, this project is returned as a match
>  - if subject_match is non-empty, the project is returned as a match if
>    the subject matches
>  
> If this understanding is incorrect, bail out here :)
> 
> Consider a list like netdev. It has at least a couple of userspace
> projects (iproute2, ethtool) that could conceivably be spun off into
> their own lists, and which could use subject matches, but most patches
> are for the linux kernel and do not have usable subject matches.
> 
> As I understand it, you would need to have 3 projects:
>  - for iproute2, subject-match \[[^\]]*iproute[^\]]*\]
>  - for ethtool, subject-match \[[^\]]*ethtool[^\]]*\]
> Then how would you match the rest of the list? You could have an empty
> subject-match, but then if that is loaded first out of the database, it
> would match first and patches would never make it to the iproute2 or
> ethtool projects.
> 
> I think you need some sort of default or fall-back or last-resort - the
> name is not important: just that it will match *only* if no other
> project matches.


Good point.  Internally in Red Hat we had a default bucket that caught stuff
like this because regex's can't capture 100% of human behaviour (humans are
great at doing dumb things :-) ).

Periodically we just reviewed the bucket and manually moved patches to the
right project.  I think that is what you are thinking, right?

Cheers,
Don



> 
> Does that sound right?
> 
> I have some other minor comments which I have included throughout:
> 
> > diff --git a/docs/api.yaml b/docs/api.yaml
> > index 3e79f0b..3373226 100644
> > --- a/docs/api.yaml
> > +++ b/docs/api.yaml
> > @@ -374,6 +374,9 @@ definitions:
> >        list_id:
> >          type: string
> >          description: Mailing list identifier for project.
> > +      subject_match:
> > +        type: string
> > +        description: Regex used for email filtering.
> This is good.
> 
> > diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
> > index c50b7b6..870d7ee 100644
> > --- a/docs/deployment/management.rst
> > +++ b/docs/deployment/management.rst
> > @@ -63,7 +63,7 @@ due to, for example, an outage.
> >  .. option:: --list-id <list-id>
> >  
> >     mailing list ID. If not supplied, this will be extracted from the mail
> > -   headers.
> > +   headers. Don't use this option if you require filtering based on subjects.
> >  
> >  .. option:: infile
> 
> I don't understand why this would interfere with subject
> filtering. I notice below that you've added a new method that matches by
> list-id and subject and kept the old list-id matching one - does the
> command-line option use the old method?
> 
> I think I would prefer that there only be one system of mapping mails to
> projects and for this restriction to go away, but if there's a good
> reason for it to stay I am always open to persuasion.
> 
> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> > index 446c473..597f605 100644
> > --- a/patchwork/api/project.py
> > +++ b/patchwork/api/project.py
> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
> >      class Meta:
> >          model = Project
> >          fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> > -        read_only_fields = ('name', 'maintainers')
> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> > +                  'subject_match')
> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
> This looks good - albeit I am not an API expert.
> 
> > diff --git a/patchwork/models.py b/patchwork/models.py
> > index 11886f1..907707f 100644
> > --- a/patchwork/models.py
> > +++ b/patchwork/models.py
> > @@ -71,8 +71,14 @@ class Project(models.Model):
> >  
> >      linkname = models.CharField(max_length=255, unique=True)
> >      name = models.CharField(max_length=255, unique=True)
> > -    listid = models.CharField(max_length=255, unique=True)
> > +    listid = models.CharField(max_length=255)
> >      listemail = models.CharField(max_length=200)
> > +    subject_match = models.CharField(
> > +        max_length=64, blank=True, default='', help_text='Regex to match the '
> > +        'subject against if only part of emails sent to the list belongs to '
> > +        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
> > +        'rules for more projects match the first one returned from DB is '
> > +        'chosen.')
> 
> I don't want to bikeshed this too much, but it definitely needs to say
> that a blank string matches everything.
> 
> > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > index ac7dc5f..015f709 100644
> > --- a/patchwork/parser.py
> > +++ b/patchwork/parser.py
> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
> >          project = Project.objects.get(listid=list_id)
> >      except Project.DoesNotExist:
> >          logger.debug("'%s' if not a valid project list-id", list_id)
> > +    except Project.MultipleObjectsReturned:
> > +        logger.debug("Multiple projects with list-id '%s' found", list_id)
> >      return project
> >  
> >  
> > +def find_project_by_id_and_subject(list_id, subject):
> > +    """Find a `project` object based on `list_id` and subject prefix match."""
> > +    projects = Project.objects.filter(listid=list_id)
> > +    for project in projects:
> > +        if (not project.subject_match or
> > +                re.search(project.subject_match, subject,
> > +                          re.MULTILINE | re.IGNORECASE)):
> > +            return project
> > +
> > +    logger.debug("No project to match email with list-id '%s', subject '%s' "
> > +                 "found", list_id, subject)
> > +    return None
> > +
> > +
> >  def find_project_by_header(mail):
> >      project = None
> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
> >  
> >              listid = match.group(1)
> >  
> > -            project = find_project_by_id(listid)
> > +            project = find_project_by_id_and_subject(listid,
> > +                                                     mail.get('Subject'))
> >              if project:
> >                  break
> Again here there is the 2 ways to find a project. It does look like
> subject matching only happens if you don't specify --list-id and I think
> that is probably not the approach I want to take.
> 
> Patchwork also extracts the bits of the subject between square brackets
> - would matching on that be more appropriate? (See subject_prefixes in
> parser.py.) That would mean the regex will automatically select for
> things in square brackets without having to encode that in the regex,
> which might avoid some user confusion and misdirected mail.
> 
> Lastly, please could you also add a simple test case for this: doesn't need to
> be big, just enough to show that the feature works.
> 
> Thanks again for your contribution to Patchwork!
> 
> Regards,
> Daniel
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Veronika Kabatova Feb. 5, 2018, 7 p.m. UTC | #11
----- Original Message -----
> From: "Daniel Axtens" <dja@axtens.net>
> To: vkabatov@redhat.com, patchwork@lists.ozlabs.org
> Sent: Monday, February 5, 2018 4:24:21 PM
> Subject: Re: [PATCH] Implement list filtering
> 
> Hi Veronika,
> 
> So, having been convinced of the merits of this approach, here's my code
> review.
> 

Hi, thanks for the comments!

> > Sometimes, multiple projects reside at the same mailing list. So far,
> > Patchwork only allowed a single project per mailing list, which made it
> > impossible for these projects to use Patchwork (unless they did some
> > dirty hacks).
> >
> > Add a new property `subject_match` to projects and implement filtering
> > on (list_id, subject_match) match instead of solely list_id. Instance
> > admin can specify a regex on a per-project basis when the project is
> > created.
> 
> Ok, so let me see if I have the logic of this patch right.
> 
> A message comes in.
> 
> A set of possible projects is found based on the list-id.
> 
> Then, starting from the first one returned by the db (with no intrinsic
> ordering), the subject_match field is examined:
>  - if subject_match is empty, this project is returned as a match
>  - if subject_match is non-empty, the project is returned as a match if
>    the subject matches
>  
> If this understanding is incorrect, bail out here :)
> 
> Consider a list like netdev. It has at least a couple of userspace
> projects (iproute2, ethtool) that could conceivably be spun off into
> their own lists, and which could use subject matches, but most patches
> are for the linux kernel and do not have usable subject matches.
> 
> As I understand it, you would need to have 3 projects:
>  - for iproute2, subject-match \[[^\]]*iproute[^\]]*\]
>  - for ethtool, subject-match \[[^\]]*ethtool[^\]]*\]
> Then how would you match the rest of the list? You could have an empty
> subject-match, but then if that is loaded first out of the database, it
> would match first and patches would never make it to the iproute2 or
> ethtool projects.
> 

My original plan was that if you need both matching projects and a default
one on top of that you can just do a negative lookahead.

> I think you need some sort of default or fall-back or last-resort - the
> name is not important: just that it will match *only* if no other
> project matches.
> 
> Does that sound right?
> 

Using a default project is doable and makes sense. Would it be acceptable
if we first checked for subject matches and in case no match was found,
checked for a project with same list_id and empty subject_match or do you
have something different in mind?

> I have some other minor comments which I have included throughout:
> 
> > diff --git a/docs/api.yaml b/docs/api.yaml
> > index 3e79f0b..3373226 100644
> > --- a/docs/api.yaml
> > +++ b/docs/api.yaml
> > @@ -374,6 +374,9 @@ definitions:
> >        list_id:
> >          type: string
> >          description: Mailing list identifier for project.
> > +      subject_match:
> > +        type: string
> > +        description: Regex used for email filtering.
> This is good.
> 
> > diff --git a/docs/deployment/management.rst
> > b/docs/deployment/management.rst
> > index c50b7b6..870d7ee 100644
> > --- a/docs/deployment/management.rst
> > +++ b/docs/deployment/management.rst
> > @@ -63,7 +63,7 @@ due to, for example, an outage.
> >  .. option:: --list-id <list-id>
> >  
> >     mailing list ID. If not supplied, this will be extracted from the mail
> > -   headers.
> > +   headers. Don't use this option if you require filtering based on
> > subjects.
> >  
> >  .. option:: infile
> 
> I don't understand why this would interfere with subject
> filtering. I notice below that you've added a new method that matches by
> list-id and subject and kept the old list-id matching one - does the
> command-line option use the old method?
> 
> I think I would prefer that there only be one system of mapping mails to
> projects and for this restriction to go away, but if there's a good
> reason for it to stay I am always open to persuasion.
> 

Yes, the command line option uses the old function. The rationale is that
if there are more projects with same list_id which differ in subject match
fields, how do you know which one to choose if you check for valid project
based only on list_id?

If you check the parse_mail() function, it chooses either the
find_project_by_id() or find_project_by_header() functions based on whether
list_id option was provided, so I kept this behavior and only changed the
helper function for find_project_by_header().

It's not hard to change find_project_by_id() to respect subject matches
too if you want. I just felt that it defeats the purpose of having the
option to skip header matching at altogether.

> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
> > index 446c473..597f605 100644
> > --- a/patchwork/api/project.py
> > +++ b/patchwork/api/project.py
> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
> >      class Meta:
> >          model = Project
> >          fields = ('id', 'url', 'name', 'link_name', 'list_id',
> >          'list_email',
> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
> > -        read_only_fields = ('name', 'maintainers')
> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
> > +                  'subject_match')
> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
> This looks good - albeit I am not an API expert.
> 
> > diff --git a/patchwork/models.py b/patchwork/models.py
> > index 11886f1..907707f 100644
> > --- a/patchwork/models.py
> > +++ b/patchwork/models.py
> > @@ -71,8 +71,14 @@ class Project(models.Model):
> >  
> >      linkname = models.CharField(max_length=255, unique=True)
> >      name = models.CharField(max_length=255, unique=True)
> > -    listid = models.CharField(max_length=255, unique=True)
> > +    listid = models.CharField(max_length=255)
> >      listemail = models.CharField(max_length=200)
> > +    subject_match = models.CharField(
> > +        max_length=64, blank=True, default='', help_text='Regex to match
> > the '
> > +        'subject against if only part of emails sent to the list belongs
> > to '
> > +        'this project. Will be used with IGNORECASE and MULTILINE flags.
> > If '
> > +        'rules for more projects match the first one returned from DB is '
> > +        'chosen.')
> 
> I don't want to bikeshed this too much, but it definitely needs to say
> that a blank string matches everything.

Sure, will add.

> 
> > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > index ac7dc5f..015f709 100644
> > --- a/patchwork/parser.py
> > +++ b/patchwork/parser.py
> > @@ -157,9 +157,25 @@ def find_project_by_id(list_id):
> >          project = Project.objects.get(listid=list_id)
> >      except Project.DoesNotExist:
> >          logger.debug("'%s' if not a valid project list-id", list_id)
> > +    except Project.MultipleObjectsReturned:
> > +        logger.debug("Multiple projects with list-id '%s' found", list_id)
> >      return project
> >  
> >  
> > +def find_project_by_id_and_subject(list_id, subject):
> > +    """Find a `project` object based on `list_id` and subject prefix
> > match."""
> > +    projects = Project.objects.filter(listid=list_id)
> > +    for project in projects:
> > +        if (not project.subject_match or
> > +                re.search(project.subject_match, subject,
> > +                          re.MULTILINE | re.IGNORECASE)):
> > +            return project
> > +
> > +    logger.debug("No project to match email with list-id '%s', subject
> > '%s' "
> > +                 "found", list_id, subject)
> > +    return None
> > +
> > +
> >  def find_project_by_header(mail):
> >      project = None
> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
> >  
> >              listid = match.group(1)
> >  
> > -            project = find_project_by_id(listid)
> > +            project = find_project_by_id_and_subject(listid,
> > +                                                     mail.get('Subject'))
> >              if project:
> >                  break
> Again here there is the 2 ways to find a project. It does look like
> subject matching only happens if you don't specify --list-id and I think
> that is probably not the approach I want to take.
> 
> Patchwork also extracts the bits of the subject between square brackets
> - would matching on that be more appropriate? (See subject_prefixes in
> parser.py.) That would mean the regex will automatically select for
> things in square brackets without having to encode that in the regex,
> which might avoid some user confusion and misdirected mail.
> 

I though about this but didn't find an easy way to make it work for us.
For example, we accept all of

RHEL7.4
RHEL 7.4
RHEL74
RHEL-7.4
RHEL-74

which means finding 'RHEL' and '7.4' as elements in subject_prefixes is
not enough and we'd need to do something more elaborate, taking into
account possible separators between keywords etc.

Besides that, if you only need to match against single keyword in the
whole subject line (like the 'ethtool' project you mentioned previously)
it's enough to use that keyword since re.search() matches it and you can
avoid all of the .* regex stuff. Maybe I should mention it in the option
description as well...

> Lastly, please could you also add a simple test case for this: doesn't need
> to
> be big, just enough to show that the feature works.
> 

Okay, I'll add that.

> Thanks again for your contribution to Patchwork!
> 
> Regards,
> Daniel
> 

Thanks,
Veronika
Daniel Axtens Feb. 13, 2018, 11:21 a.m. UTC | #12
Hi Veronika,

Apologies for the delay in getting back to you, and thanks for your responses.
 
>> Consider a list like netdev. It has at least a couple of userspace
>> projects (iproute2, ethtool) that could conceivably be spun off into
>> their own lists, and which could use subject matches, but most patches
>> are for the linux kernel and do not have usable subject matches.
>> 
>> As I understand it, you would need to have 3 projects:
>>  - for iproute2, subject-match \[[^\]]*iproute[^\]]*\]
>>  - for ethtool, subject-match \[[^\]]*ethtool[^\]]*\]
>> Then how would you match the rest of the list? You could have an empty
>> subject-match, but then if that is loaded first out of the database, it
>> would match first and patches would never make it to the iproute2 or
>> ethtool projects.
>> 
>
> My original plan was that if you need both matching projects and a default
> one on top of that you can just do a negative lookahead.
>
>> I think you need some sort of default or fall-back or last-resort - the
>> name is not important: just that it will match *only* if no other
>> project matches.
>> 
>> Does that sound right?
>> 
>
> Using a default project is doable and makes sense. Would it be acceptable
> if we first checked for subject matches and in case no match was found,
> checked for a project with same list_id and empty subject_match or do you
> have something different in mind?

That sounds like a good strategy. I certainly prefer it to negative
lookahead.

>
>> I have some other minor comments which I have included throughout:
>> 
>> > diff --git a/docs/api.yaml b/docs/api.yaml
>> > index 3e79f0b..3373226 100644
>> > --- a/docs/api.yaml
>> > +++ b/docs/api.yaml
>> > @@ -374,6 +374,9 @@ definitions:
>> >        list_id:
>> >          type: string
>> >          description: Mailing list identifier for project.
>> > +      subject_match:
>> > +        type: string
>> > +        description: Regex used for email filtering.
>> This is good.
>> 
>> > diff --git a/docs/deployment/management.rst
>> > b/docs/deployment/management.rst
>> > index c50b7b6..870d7ee 100644
>> > --- a/docs/deployment/management.rst
>> > +++ b/docs/deployment/management.rst
>> > @@ -63,7 +63,7 @@ due to, for example, an outage.
>> >  .. option:: --list-id <list-id>
>> >  
>> >     mailing list ID. If not supplied, this will be extracted from the mail
>> > -   headers.
>> > +   headers. Don't use this option if you require filtering based on
>> > subjects.
>> >  
>> >  .. option:: infile
>> 
>> I don't understand why this would interfere with subject
>> filtering. I notice below that you've added a new method that matches by
>> list-id and subject and kept the old list-id matching one - does the
>> command-line option use the old method?
>> 
>> I think I would prefer that there only be one system of mapping mails to
>> projects and for this restriction to go away, but if there's a good
>> reason for it to stay I am always open to persuasion.
>> 
>
> Yes, the command line option uses the old function. The rationale is that
> if there are more projects with same list_id which differ in subject match
> fields, how do you know which one to choose if you check for valid project
> based only on list_id?
>
> If you check the parse_mail() function, it chooses either the
> find_project_by_id() or find_project_by_header() functions based on whether
> list_id option was provided, so I kept this behavior and only changed the
> helper function for find_project_by_header().
>
> It's not hard to change find_project_by_id() to respect subject matches
> too if you want. I just felt that it defeats the purpose of having the
> option to skip header matching at altogether.

Oh I see and I think I understand.

From a conceptual point of view, I see the --list-id option as just
overriding the list id; I don't think it overrides any other headers
(X-Patchwork-* for example).

However, I realised that my motivation is largely that I'd like to take
my existing archives of email, set up a new project with the same
list-id as the default project but a different subject match, and then
use --list-id=patchwork.ozlabs.org to injest my mail archive from a
different list. I realise now that this is just laziness on my part: I
should just change the list ids!

I wonder if anyone else uses the --list-id parameter (especially live)
and how best to serve their needs. I lean towards enabling subject
matching even if it is used, just in case someone feeds a non-list or
multiple lists into the one patchwork project. If anyone has thoughts
here I'd appreciate them.

(If your v2 addresses my other comments then I'll merge it either way,
we can always change it later.)

>
>> > diff --git a/patchwork/api/project.py b/patchwork/api/project.py
>> > index 446c473..597f605 100644
>> > --- a/patchwork/api/project.py
>> > +++ b/patchwork/api/project.py
>> > @@ -39,8 +39,9 @@ class ProjectSerializer(HyperlinkedModelSerializer):
>> >      class Meta:
>> >          model = Project
>> >          fields = ('id', 'url', 'name', 'link_name', 'list_id',
>> >          'list_email',
>> > -                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
>> > -        read_only_fields = ('name', 'maintainers')
>> > +                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
>> > +                  'subject_match')
>> > +        read_only_fields = ('name', 'maintainers', 'subject_match')
>> This looks good - albeit I am not an API expert.
>> 
>> > diff --git a/patchwork/models.py b/patchwork/models.py
>> > index 11886f1..907707f 100644
>> > --- a/patchwork/models.py
>> > +++ b/patchwork/models.py
>> > @@ -71,8 +71,14 @@ class Project(models.Model):
>> >  
>> >      linkname = models.CharField(max_length=255, unique=True)
>> >      name = models.CharField(max_length=255, unique=True)
>> > -    listid = models.CharField(max_length=255, unique=True)
>> > +    listid = models.CharField(max_length=255)
>> >      listemail = models.CharField(max_length=200)
>> > +    subject_match = models.CharField(
>> > +        max_length=64, blank=True, default='', help_text='Regex to match
>> > the '
>> > +        'subject against if only part of emails sent to the list belongs
>> > to '
>> > +        'this project. Will be used with IGNORECASE and MULTILINE flags.
>> > If '
>> > +        'rules for more projects match the first one returned from DB is '
>> > +        'chosen.')
>> 
>> I don't want to bikeshed this too much, but it definitely needs to say
>> that a blank string matches everything.
>
> Sure, will add.
(This is obvious, but please ensure your updated behaviour for default
projects is reflected here and in the docs.)


>> >  def find_project_by_header(mail):
>> >      project = None
>> >      listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
>> > @@ -181,7 +197,8 @@ def find_project_by_header(mail):
>> >  
>> >              listid = match.group(1)
>> >  
>> > -            project = find_project_by_id(listid)
>> > +            project = find_project_by_id_and_subject(listid,
>> > +                                                     mail.get('Subject'))
>> >              if project:
>> >                  break
>> Again here there is the 2 ways to find a project. It does look like
>> subject matching only happens if you don't specify --list-id and I think
>> that is probably not the approach I want to take.
>> 
>> Patchwork also extracts the bits of the subject between square brackets
>> - would matching on that be more appropriate? (See subject_prefixes in
>> parser.py.) That would mean the regex will automatically select for
>> things in square brackets without having to encode that in the regex,
>> which might avoid some user confusion and misdirected mail.
>> 
>
> I though about this but didn't find an easy way to make it work for us.
> For example, we accept all of
>
> RHEL7.4
> RHEL 7.4
> RHEL74
> RHEL-7.4
> RHEL-74
>
> which means finding 'RHEL' and '7.4' as elements in subject_prefixes is
> not enough and we'd need to do something more elaborate, taking into
> account possible separators between keywords etc.

I see - keep matching by the entire subject then.

> Besides that, if you only need to match against single keyword in the
> whole subject line (like the 'ethtool' project you mentioned previously)
> it's enough to use that keyword since re.search() matches it and you can
> avoid all of the .* regex stuff. Maybe I should mention it in the option
> description as well...

See, I want to match (and these are real examples from netdev)
[PATCH ethtool v2] ethtool: Support for FEC encoding control
but not
[PATCH RFC 07/11] dpaa_eth: add ethtool functionality
So I do want to keep the \[ and \] in there somehow.

I think we should assume people know a little about re.match() and not
put too much of that in the description lest it become super long.

Thanks again.

Regards,
Daniel

>> Lastly, please could you also add a simple test case for this: doesn't need
>> to
>> be big, just enough to show that the feature works.
>> 
>
> Okay, I'll add that.
>
>> Thanks again for your contribution to Patchwork!
>> 
>> Regards,
>> Daniel
>> 
>
> Thanks,
> Veronika
diff mbox series

Patch

diff --git a/docs/api.yaml b/docs/api.yaml
index 3e79f0b..3373226 100644
--- a/docs/api.yaml
+++ b/docs/api.yaml
@@ -374,6 +374,9 @@  definitions:
       list_id:
         type: string
         description: Mailing list identifier for project.
+      subject_match:
+        type: string
+        description: Regex used for email filtering.
       list_email:
         type: string
         description: Mailing list email address for project.
diff --git a/docs/deployment/management.rst b/docs/deployment/management.rst
index c50b7b6..870d7ee 100644
--- a/docs/deployment/management.rst
+++ b/docs/deployment/management.rst
@@ -63,7 +63,7 @@  due to, for example, an outage.
 .. option:: --list-id <list-id>
 
    mailing list ID. If not supplied, this will be extracted from the mail
-   headers.
+   headers. Don't use this option if you require filtering based on subjects.
 
 .. option:: infile
 
@@ -88,7 +88,7 @@  the :ref:`deployment installation guide <deployment-parsemail>`.
 .. option:: --list-id <list-id>
 
    mailing list ID. If not supplied, this will be extracted from the mail
-   headers.
+   headers. Don't use this option if you require filtering based on subjects.
 
 .. option:: infile
 
diff --git a/patchwork/api/project.py b/patchwork/api/project.py
index 446c473..597f605 100644
--- a/patchwork/api/project.py
+++ b/patchwork/api/project.py
@@ -39,8 +39,9 @@  class ProjectSerializer(HyperlinkedModelSerializer):
     class Meta:
         model = Project
         fields = ('id', 'url', 'name', 'link_name', 'list_id', 'list_email',
-                  'web_url', 'scm_url', 'webscm_url', 'maintainers')
-        read_only_fields = ('name', 'maintainers')
+                  'web_url', 'scm_url', 'webscm_url', 'maintainers',
+                  'subject_match')
+        read_only_fields = ('name', 'maintainers', 'subject_match')
         extra_kwargs = {
             'url': {'view_name': 'api-project-detail'},
         }
diff --git a/patchwork/migrations/0021_add_subject_match_to_project.py b/patchwork/migrations/0021_add_subject_match_to_project.py
new file mode 100644
index 0000000..6066266
--- /dev/null
+++ b/patchwork/migrations/0021_add_subject_match_to_project.py
@@ -0,0 +1,29 @@ 
+# -*- coding: utf-8 -*-
+# Generated by Django 1.10.8 on 2018-01-19 18:16
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('patchwork', '0020_tag_show_column'),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name='project',
+            name='subject_match',
+            field=models.CharField(blank=True, default=b'', help_text=b'Regex to match the subject against if only part of emails sent to the list belongs to this project. Will be used with IGNORECASE and MULTILINE flags. If rules for more projects match the first one returned from DB is chosen.', max_length=64),
+        ),
+        migrations.AlterField(
+            model_name='project',
+            name='listid',
+            field=models.CharField(max_length=255),
+        ),
+        migrations.AlterUniqueTogether(
+            name='project',
+            unique_together=set([('listid', 'subject_match')]),
+        ),
+    ]
diff --git a/patchwork/models.py b/patchwork/models.py
index 11886f1..907707f 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -71,8 +71,14 @@  class Project(models.Model):
 
     linkname = models.CharField(max_length=255, unique=True)
     name = models.CharField(max_length=255, unique=True)
-    listid = models.CharField(max_length=255, unique=True)
+    listid = models.CharField(max_length=255)
     listemail = models.CharField(max_length=200)
+    subject_match = models.CharField(
+        max_length=64, blank=True, default='', help_text='Regex to match the '
+        'subject against if only part of emails sent to the list belongs to '
+        'this project. Will be used with IGNORECASE and MULTILINE flags. If '
+        'rules for more projects match the first one returned from DB is '
+        'chosen.')
 
     # url metadata
 
@@ -100,6 +106,7 @@  class Project(models.Model):
         return self.name
 
     class Meta:
+        unique_together = (('listid', 'subject_match'),)
         ordering = ['linkname']
 
 
diff --git a/patchwork/parser.py b/patchwork/parser.py
index ac7dc5f..015f709 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -157,9 +157,25 @@  def find_project_by_id(list_id):
         project = Project.objects.get(listid=list_id)
     except Project.DoesNotExist:
         logger.debug("'%s' if not a valid project list-id", list_id)
+    except Project.MultipleObjectsReturned:
+        logger.debug("Multiple projects with list-id '%s' found", list_id)
     return project
 
 
+def find_project_by_id_and_subject(list_id, subject):
+    """Find a `project` object based on `list_id` and subject prefix match."""
+    projects = Project.objects.filter(listid=list_id)
+    for project in projects:
+        if (not project.subject_match or
+                re.search(project.subject_match, subject,
+                          re.MULTILINE | re.IGNORECASE)):
+            return project
+
+    logger.debug("No project to match email with list-id '%s', subject '%s' "
+                 "found", list_id, subject)
+    return None
+
+
 def find_project_by_header(mail):
     project = None
     listid_res = [re.compile(r'.*<([^>]+)>.*', re.S),
@@ -181,7 +197,8 @@  def find_project_by_header(mail):
 
             listid = match.group(1)
 
-            project = find_project_by_id(listid)
+            project = find_project_by_id_and_subject(listid,
+                                                     mail.get('Subject'))
             if project:
                 break
 
diff --git a/releasenotes/notes/list-filtering-4643d98b4064367a.yaml b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
new file mode 100644
index 0000000..21c1680
--- /dev/null
+++ b/releasenotes/notes/list-filtering-4643d98b4064367a.yaml
@@ -0,0 +1,10 @@ 
+---
+features:
+  - |
+    Allow list filtering into multiple projects (and email dropping) based on
+    subject prefixes. Disabled by default, enable by specifying a regular
+    expression which needs to be matched in the subject on a per-project basis
+    (field `subject_match`).
+api:
+  - |
+    Expose `subject_match` in REST API.