diff mbox series

parser: Don't crash when From: is list email but has weird mangle format

Message ID 20200415090656.21101-1-ajd@linux.ibm.com
State Accepted
Headers show
Series parser: Don't crash when From: is list email but has weird mangle format | expand

Commit Message

Andrew Donnellan April 15, 2020, 9:06 a.m. UTC
get_original_sender() tries to demangle DMARC-mangled From headers, in
the case where the email's From address is the list address. It knows how
to handle Google Groups and Mailman style mangling, where the original
submitter's name will be turned into e.g. "Andrew Donnellan via
linuxppc-dev".

If an email has the From header set to the list address but has a name that
doesn't include " via ", we'll throw an exception because stripped_name
hasn't been set. Sometimes this is because the list name is seemingly
empty, resulting in a mangled name like "Andrew Donnellan via"
without the space after "via" that we detect. Handle this as well as we can
instead, and add a test.

Fixes: 8279a84238c10 ("parser: Unmangle From: headers that have been mangled for DMARC purposes")
Reported-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>

---

Backport to stable?
---
 patchwork/parser.py            |  7 +++++++
 patchwork/tests/test_parser.py | 23 +++++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Stephen Finucane April 18, 2020, 11:04 a.m. UTC | #1
On Wed, 2020-04-15 at 19:06 +1000, Andrew Donnellan wrote:
> get_original_sender() tries to demangle DMARC-mangled From headers, in
> the case where the email's From address is the list address. It knows how
> to handle Google Groups and Mailman style mangling, where the original
> submitter's name will be turned into e.g. "Andrew Donnellan via
> linuxppc-dev".
> 
> If an email has the From header set to the list address but has a name that
> doesn't include " via ", we'll throw an exception because stripped_name
> hasn't been set. Sometimes this is because the list name is seemingly
> empty, resulting in a mangled name like "Andrew Donnellan via"
> without the space after "via" that we detect. Handle this as well as we can
> instead, and add a test.
> 
> Fixes: 8279a84238c10 ("parser: Unmangle From: headers that have been mangled for DMARC purposes")
> Reported-by: Jeremy Kerr <jk@ozlabs.org>
> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>

Looks good to me.

Reviewed-by: Stephen Finucane <stephen@that.guru>

---
> 
> Backport to stable?

Merged to master and backported to stable.

Stephen
diff mbox series

Patch

diff --git a/patchwork/parser.py b/patchwork/parser.py
index a09fd754c4be..dce03a4ff827 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -373,6 +373,13 @@  def get_original_sender(mail, name, email):
         # Mailman uses the format "<name> via <list>"
         # Google Groups uses "'<name>' via <list>"
         stripped_name = name[:name.rfind(' via ')].strip().strip("'")
+    elif name.endswith(' via'):
+        # Sometimes this seems to happen (perhaps if Mailman isn't set up with
+        # any list name)
+        stripped_name = name[:name.rfind(' via')].strip().strip("'")
+    else:
+        # We've hit a format that we don't expect
+        stripped_name = None
 
     original_from = clean_header(mail.get('X-Original-From', ''))
     if original_from:
diff --git a/patchwork/tests/test_parser.py b/patchwork/tests/test_parser.py
index f5631bee8329..a60eb6b4bac9 100644
--- a/patchwork/tests/test_parser.py
+++ b/patchwork/tests/test_parser.py
@@ -366,6 +366,29 @@  class SenderCorrelationTest(TestCase):
         self.assertEqual(person_b._state.adding, False)
         self.assertEqual(person_b.id, person_a.id)
 
+    def test_weird_dmarc_munging(self):
+        project = create_project()
+        real_sender = 'Existing Sender <existing@example.com>'
+        munged_sender1 = "'Existing Sender' via <{}>".format(project.listemail)
+        munged_sender2 = "'Existing Sender' <{}>".format(project.listemail)
+
+        # Unmunged author
+        mail = self._create_email(real_sender)
+        person_a = get_or_create_author(mail, project)
+        person_a.save()
+
+        # Munged with no list name
+        mail = self._create_email(munged_sender1, None, None, real_sender)
+        person_b = get_or_create_author(mail, project)
+        self.assertEqual(person_b._state.adding, False)
+        self.assertEqual(person_b.id, person_a.id)
+
+        # Munged with no 'via'
+        mail = self._create_email(munged_sender2, None, None, real_sender)
+        person_b = get_or_create_author(mail, project)
+        self.assertEqual(person_b._state.adding, False)
+        self.assertEqual(person_b.id, person_a.id)
+
 
 class SeriesCorrelationTest(TestCase):
     """Validate correct behavior of find_series."""