diff mbox

Guessing project based on email address

Message ID 1297719787.10411.46.camel@feioso
State RFC
Headers show

Commit Message

Guilherme Salgado Feb. 14, 2011, 9:43 p.m. UTC
Hi folks,

We're going to use Patchwork to track Linaro patches, but instead of
subscribing to all mailing lists to which patches may be sent, we're
asking all Linaro developers to CC a common email address whenever they
send a patch upstream, and we'll then feed all mail delivered to that
address into Patchwork.

That means most of the messages won't have a List-ID header, but we
should still be able to lookup the correct Patchwork project for a given
message based on the email addresses of recipients, hence I'd like to
propose changing find_project() in apps/patchwork/bin/parsemail.py to
fallback to email address lookup when a project can't be found with the
List-ID in the message.

I think such a change wouldn't cause any harm to regular Patchwork
instances and may even be helpful if a mailing list's list-id is
changed.  I'm including a patch which does what I've described above,
just to illustrate, but if you guys think this is a sane change, I'll be
happy to send a properly formatted patch (as soon as I figure out how to
make git combine the patch and the cover letter in a single email ;).

Cheers,
Guilherme

Comments

Jeremy Kerr Feb. 15, 2011, 12:42 a.m. UTC | #1
Hi Guilherme,

> We're going to use Patchwork to track Linaro patches, but instead of
> subscribing to all mailing lists to which patches may be sent, we're
> asking all Linaro developers to CC a common email address whenever they
> send a patch upstream, and we'll then feed all mail delivered to that
> address into Patchwork.

I don't see the reasoning for doing this - it's more work (once-off setup vs 
getting every contributor to do stuff repeatedly) and less reliable 
(contributors may forget to add the magic address, or not know about this 
requirement).

If there's sound justification for doing this though, I'm not averse to the 
change, but it needs to be sensible :)

Cheers,


Jeremy
Guilherme Salgado Feb. 15, 2011, 11:10 a.m. UTC | #2
Hi Jeremy,

On Tue, 2011-02-15 at 08:42 +0800, Jeremy Kerr wrote:
> Hi Guilherme,
> 
> > We're going to use Patchwork to track Linaro patches, but instead of
> > subscribing to all mailing lists to which patches may be sent, we're
> > asking all Linaro developers to CC a common email address whenever they
> > send a patch upstream, and we'll then feed all mail delivered to that
> > address into Patchwork.
> 
> I don't see the reasoning for doing this - it's more work (once-off setup vs 
> getting every contributor to do stuff repeatedly) and less reliable 
> (contributors may forget to add the magic address, or not know about this 
> requirement).

Maybe I wasn't clear, but this is not to track patches sent to Linaro;
what we want is to track patches sent upstream by engineers working for
Linaro.  It already is the company's policy that all patches sent
upstream should be CCed to this magic address, and feeding that address'
mailbox to patchwork.linaro.org seemed like a better alternative than
continuously having to subscribe patchwork.l.o to the mailing lists of
all different projects that Linaro engineers may contribute to (we don't
have a list of all these projects beforehand and we expect it to grow
all the time).

Cheers,
Jeremy Kerr March 8, 2011, 5:58 a.m. UTC | #3
Hi Guilherme,

> all different projects that Linaro engineers may contribute to (we don't
> have a list of all these projects beforehand and we expect it to grow
> all the time).

You'll still have the same problem here; patchwork needs to know about the 
project before it will parse patches for that project.

My only concern is the impresicion of using the list address; it'd be common 
to send a patch to mulitple lists. If patchwork is configured for both lists, 
we'll end up with a dropped patch on the non-matched list.

We could work-around this with handling multiple 'parses' per incoming email, 
but we'd need to be careful about avoiding duplicates here.

Also, if this was configurable (PATCHWORK_FALLBACK_TO_LISTEMAIL perhaps?), I'd 
be much happier :)

Cheers,


Jeremy
Guilherme Salgado March 9, 2011, 11:11 a.m. UTC | #4
On Tue, 2011-03-08 at 13:58 +0800, Jeremy Kerr wrote:
> Hi Guilherme,
> 
> > all different projects that Linaro engineers may contribute to (we don't
> > have a list of all these projects beforehand and we expect it to grow
> > all the time).
> 
> You'll still have the same problem here; patchwork needs to know about the 
> project before it will parse patches for that project.

You're right, but we have a custom email-parsing script which places
patches sent to unknown mailing lists under a catch-all project. That
allows us to identify new projects that need to be created, and together
with a new form for moving multiple patches from one project to another
we can also easily move the patches from the catch-all project to the
newly created ones.

I haven't sent the patches for those changes because I thought they
wouldn't be of much use in general, but I'm happy to send them if you
think they could be useful to others.

> 
> My only concern is the impresicion of using the list address; it'd be common 
> to send a patch to mulitple lists. If patchwork is configured for both lists, 
> we'll end up with a dropped patch on the non-matched list.
> 
> We could work-around this with handling multiple 'parses' per incoming email, 
> but we'd need to be careful about avoiding duplicates here.

We could also make find_project() return a list of projects and create
one Patch for every project returned.

> 
> Also, if this was configurable (PATCHWORK_FALLBACK_TO_LISTEMAIL perhaps?), I'd 
> be much happier :)

Sure, I'm happy to make it configurable.

Cheers,
diff mbox

Patch

diff --git a/apps/patchwork/bin/parsemail.py b/apps/patchwork/bin/parsemail.py
index 700cb6f..305ab95 100755
--- a/apps/patchwork/bin/parsemail.py
+++ b/apps/patchwork/bin/parsemail.py
@@ -55,7 +55,7 @@  def clean_header(header):
 
     return normalise_space(u' '.join(fragments))
 
-def find_project(mail):
+def find_project_by_listid(mail):
     project = None
     listid_res = [re.compile('.*<([^>]+)>.*', re.S),
                   re.compile('^([\S]+)$', re.S)]
@@ -81,6 +81,30 @@  def find_project(mail):
 
     return project
 
+def extract_email_addresses(str):
+    email_re = re.compile(
+        r"([_\.0-9a-zA-Z-+=]+@(([0-9a-zA-Z-]{1,}\.)*)[a-zA-Z]{2,})")
+    # re.findall() will return a list of tuples because we have multiple
+    # groups on the regex above, but we're only interested on the outermost
+    # group (which should contain the whole email address), so we drop the
+    # second and third groups.
+    return [email for email, dummy, dummy2 in email_re.findall(str)]
+
+def find_project_by_list_address(mail):
+    recipients = mail.get('To', '') + mail.get('CC', '')
+    for email_address in extract_email_addresses(recipients):
+        try:
+            return Project.objects.get(listemail = email_address)
+        except Project.DoesNotExist:
+            pass
+    return None
+
+def find_project(mail):
+    project = find_project_by_listid(mail)
+    if project is None:
+        project = find_project_by_list_address(mail)
+    return project
+
 def find_author(mail):
 
     from_header = clean_header(mail.get('From'))
diff --git a/apps/patchwork/tests/patchparser.py b/apps/patchwork/tests/patchparser.py
index ff0025a..d4d15d1 100644
--- a/apps/patchwork/tests/patchparser.py
+++ b/apps/patchwork/tests/patchparser.py
@@ -34,8 +34,9 @@  class PatchTest(unittest.TestCase):
     default_subject = defaults.subject
     project = defaults.project
 
-from patchwork.bin.parsemail import find_content, find_author, find_project, \
-                                    parse_mail
+from patchwork.bin.parsemail import (
+    extract_email_addresses, find_content, find_author, find_project,
+    parse_mail)
 
 class InlinePatchTest(PatchTest):
     patch_filename = '0001-add-line.patch'
@@ -276,18 +277,44 @@  class MultipleProjectPatchCommentTest(MultipleProjectPatchTest):
             # and the one we parsed in setUp()
             self.assertEquals(Comment.objects.filter(patch = patch).count(), 2)
 
-class ListIdHeaderTest(unittest.TestCase):
-    """ Test that we parse List-Id headers from mails correctly """
+class EmailProjectGuessing(unittest.TestCase):
+    """Projects are guessed based on List-Id headers or recipient addresses"""
     def setUp(self):
         self.project = Project(linkname = 'test-project-1', name = 'Project 1',
                 listid = '1.example.com', listemail='1@example.com')
         self.project.save()
 
+    def testExtractingEmailAddressesFromRecipientsList(self):
+        emails = extract_email_addresses(
+            '"Foo Bar" <foo.bar@example.com>,'
+            '<baz+list@foo.example.com>,'
+            'bar-foo@bar.foo.example.com,'
+            # Notice that this one is not a valid email address.
+            'bar-foo@.com')
+        self.assertEqual(
+            ['foo.bar@example.com',
+             'baz+list@foo.example.com',
+             'bar-foo@bar.foo.example.com'],
+            emails)
+
     def testNoListId(self):
         email = MIMEText('')
         project = find_project(email)
         self.assertEquals(project, None)
 
+    def testNoListIdWithListEmailAsRecipient(self):
+        email = MIMEText('')
+        email['To'] = '"First dev list" <1@example.com>'
+        project = find_project(email)
+        self.assertEquals(self.project, project)
+
+    def testNoListIdWithListEmailAsCC(self):
+        email = MIMEText('')
+        email['CC'] = ('"First maintainer <maintainer@example.com>, '
+                       '"First dev list" <1@example.com>')
+        project = find_project(email)
+        self.assertEquals(self.project, project)
+
     def testBlankListId(self):
         email = MIMEText('')
         email['List-Id'] = ''