From patchwork Fri May 6 17:49:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Finucane X-Patchwork-Id: 1627718 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="key not found in DNS" header.d=that.guru header.i=@that.guru header.a=rsa-sha256 header.s=x header.b=W49HcyHQ; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ozlabs.org (client-ip=2404:9400:2:0:216:3eff:fee1:b9f1; helo=lists.ozlabs.org; envelope-from=patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org; receiver=) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee1:b9f1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Kvykx186jz9t0J for ; Sat, 7 May 2022 03:50:00 +1000 (AEST) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Kvykt1wl0z3byY for ; Sat, 7 May 2022 03:49:58 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="key not found in DNS" header.d=that.guru header.i=@that.guru header.a=rsa-sha256 header.s=x header.b=W49HcyHQ; dkim-atps=neutral X-Original-To: patchwork@lists.ozlabs.org Delivered-To: patchwork@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=that.guru (client-ip=136.175.108.168; helo=mail-108-mta168.mxroute.com; envelope-from=stephen@that.guru; receiver=) Authentication-Results: lists.ozlabs.org; dkim=fail reason="key not found in DNS" header.d=that.guru header.i=@that.guru header.a=rsa-sha256 header.s=x header.b=W49HcyHQ; dkim-atps=neutral Received: from mail-108-mta168.mxroute.com (mail-108-mta168.mxroute.com [136.175.108.168]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Kvykj4db7z3bxr for ; Sat, 7 May 2022 03:49:48 +1000 (AEST) Received: from filter006.mxroute.com ([140.82.40.27] 140.82.40.27.vultrusercontent.com) (Authenticated sender: mN4UYu2MZsgR) by mail-108-mta168.mxroute.com (ZoneMTA) with ESMTPSA id 1809a7decaf000926a.001 for (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Fri, 06 May 2022 17:49:45 +0000 X-Zone-Loop: 516320b789a1c20274beb996c18a81003320e9c191b7 X-Originating-IP: [140.82.40.27] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=that.guru; s=x; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=DB1OhzMT2Jlg2yC094QuahNv8nGGi9QiBh34d/2rd4U=; b=W49HcyHQtyRs1OAZ/JWtUIOddd 14R2e+dYt3em70XtqnM3olA8dDhb28C9rLEQHBbuD6PQfVSVPpbcDUw1KIrwQ/B+XdHvvXx7YM1nR A/7QwWxwKlA8MYPxl+tNjSpDllPERbAd4psXdW8GvqUD529utNGDzubGAHUHWEIM3OGeUpDbDqJar chimcyEkE8LGj3aYb81KCKzjbdX0vTVGbAKVyB19/dkhljQqN0/f02kyyzeqommZPd49ZmuSRtrMV pgnz0SVYrowe+025hRItkbUPxhCJUAlsTFYnzthoiKcKh6rLkpViuXZzTq5g9YoOe6g5keKYTypET d2TIBwKg==; From: Stephen Finucane To: patchwork@lists.ozlabs.org Subject: [PATCH 2/n] parser: Ignore CFWS in Message-ID header Date: Fri, 6 May 2022 18:49:26 +0100 Message-Id: <20220506174925.731698-1-stephen@that.guru> In-Reply-To: <20220506171027.723718-1-stephen@that.guru> References: <20220506171027.723718-1-stephen@that.guru> MIME-Version: 1.0 X-AuthUser: stephen@that.guru X-BeenThere: patchwork@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Patchwork development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: patchwork-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Patchwork" We recently started stripping comments and folding white space from the In-Reply-To and References headers. Do so also for the Message-ID header. Signed-off-by: Stephen Finucane Related: #399 --- patchwork/parser.py | 43 ++++++++++++++++++++++++++-------- patchwork/tests/test_parser.py | 32 +++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 10 deletions(-) diff --git patchwork/parser.py patchwork/parser.py index 17cc2325..f219f466 100644 --- patchwork/parser.py +++ patchwork/parser.py @@ -236,15 +236,14 @@ def _find_series_by_references(project, mail): name, prefixes = clean_subject(subject, [project.linkname]) version = parse_version(name, prefixes) - refs = find_references(mail) - h = clean_header(mail.get('Message-Id')) - if h: - refs = [h] + refs + msg_id = find_message_id(mail) + refs = [msg_id] + find_references(mail) for ref in refs: try: series = SeriesReference.objects.get( - msgid=ref[:255], project=project).series + msgid=ref[:255], project=project, + ).series if series.version != version: # if the versions don't match, at least make sure these were @@ -473,6 +472,34 @@ def find_headers(mail): return '\n'.join(strings) +def find_message_id(mail): + """Extract the 'message-id' headers from a given mail and validate it. + + The validation here is simply checking that the Message-ID is correctly + formatted per RFC-2822. However, even if it's not we'll attempt to use what + we're given because a patch tracked in Patchwork with janky threading is + better than no patch whatsoever. + """ + header = clean_header(mail.get('Message-Id')) + if not header: + raise ValueError("Broken 'Message-Id' header") + + msgid = _msgid_re.search(header) + if msgid: + msgid = msgid.group(0) + else: + # This is only info level since the admin likely can't do anything + # about this + logger.info( + "Malformed 'Message-Id' header. The 'msg-id' component should be " + "surrounded by angle brackets. Saving raw header. This may " + "include comments and extra comments." + ) + msgid = header + + return msgid[:255] + + def find_references(mail): """Construct a list of possible reply message ids. @@ -1062,11 +1089,7 @@ def parse_mail(mail, list_id=None): # parse metadata - msgid = clean_header(mail.get('Message-Id')) - if not msgid: - raise ValueError("Broken 'Message-Id' header") - msgid = msgid[:255] - + msgid = find_message_id(mail) subject = mail.get('Subject') name, prefixes = clean_subject(subject, [project.linkname]) is_comment = subject_check(subject) diff --git patchwork/tests/test_parser.py patchwork/tests/test_parser.py index f65ad4b1..980a8afb 100644 --- patchwork/tests/test_parser.py +++ patchwork/tests/test_parser.py @@ -1265,6 +1265,38 @@ class DuplicateMailTest(TestCase): self.assertEqual(Cover.objects.count(), 1) +class TestFindMessageID(TestCase): + + def test_find_message_id__missing_header(self): + email = create_email('test') + del email['Message-Id'] + email['Message-Id'] = '' + + with self.assertRaises(ValueError) as cm: + parser.find_message_id(email) + self.assertIn("Broken 'Message-Id' header", str(cm.exeception)) + + def test_find_message_id__header_with_comments(self): + """Test that we strip comments from the Message-ID field.""" + message_id = ' (message ID with a comment)' + email = create_email('test', msgid=message_id) + + expected = '' + actual = parser.find_message_id(email) + + self.assertEqual(expected, actual) + + def test_find_message_id__invalid_header_fallback(self): + """Test that we accept badly formatted Message-ID fields.""" + message_id = '5899d592-8c87-47d9-92b6-d34260ce1aa4@radware.com>' + email = create_email('test', msgid=message_id) + + expected = '5899d592-8c87-47d9-92b6-d34260ce1aa4@radware.com>' + actual = parser.find_message_id(email) + + self.assertEqual(expected, actual) + + class TestFindReferences(TestCase): def test_find_references__header_with_comments(self):