diff mbox series

Include all email headers in mboxes

Message ID 20180327151332.15881-1-vkabatov@redhat.com
State Changes Requested
Headers show
Series Include all email headers in mboxes | expand

Commit Message

Veronika Kabatova March 27, 2018, 3:13 p.m. UTC
From: Veronika Kabatova <vkabatov@redhat.com>

Solves issue #165 (Exported mboxes should include In-Reply-To,
References, etc headers). Instead of including only a few chosen ones,
all received headers are added to mboxes.

Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
---
 patchwork/views/utils.py | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Daniel Axtens April 5, 2018, 9:58 a.m. UTC | #1
vkabatov@redhat.com writes:

> From: Veronika Kabatova <vkabatov@redhat.com>
>
> Solves issue #165 (Exported mboxes should include In-Reply-To,
> References, etc headers). Instead of including only a few chosen ones,
> all received headers are added to mboxes.

Thanks for the patch.

I'm a little worried that this will get really messy - I've included a
snippet of headers from an email from an unrelated bug below. Maybe we
don't care - I guess this isn't really for human consumption but is for
consumption by e.g. git-am.

Alternatively we can blacklist headers: I don't think there's anything
worth having in Received, X-*, List-*, DKIM, ARC, SPF, etc. But I wonder
if this is just whack-a-mole in reverse.

Thoughts?

Regards,
Daniel


Delivered-To: dja@axtens.net
Received: by 10.74.139.216 with SMTP id p24csp390579ooj;
        Wed, 21 Feb 2018 01:07:26 -0800 (PST)
X-Google-Smtp-Source: AH8x227WmScjPDkd1bgpNsd0DYg4ae/OzHPrTT68ezy+A7p6CMpHdsmU6pB6dZluayRkf9LDxR5l
X-Received: by 10.99.55.1 with SMTP id e1mr2155932pga.237.1519204046116;
        Wed, 21 Feb 2018 01:07:26 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1519204046; cv=none;
        d=google.com; s=arc-20160816;
        b=PXNfl2kupZqmeFoDWy2tLz6100KxBfLj/Rmu4/OTqiLwcPVNIsGBGE2FN/e/Gj2LCu
         NDdGSTuaDdYxI0wlCEqBZBrVatoda/PHBX2Fh2HZ2rzEahAs5R2w+YucxRTAi8UT7MTo
         xxxHqQ1DehoZmG9J3l0Ckh1cXmAplnmLB7+MufnD7FT7ajNufPhGiZ2ovDCcqb0/6KxG
         8cfsjAnNAZAF3M8VF8r2KSubGuplPq4sB7aF9eklL0HmnXNVLYkyiwZbJuI/vVh6yBkc
         rKSdP02SlazOIL8FJuJQC5reac17qfbyjg/tbxyCC+/Lmc/DwexMfA4NPhEgqPZgwFWr
         9ILQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
        h=list-id:precedence:sender:content-transfer-encoding:mime-version
         :message-id:date:subject:to:from:dkim-signature
         :arc-authentication-results;
        bh=CFYy3bhn7eoWPF6AJMH1sMf+K6u6usD+1M2NOg/UHAw=;
        b=VtdyXugEOzZm0I0t+YzlUoOWX+xI4UC50ZF9w5gd+zlNTsJMDGtykiwjZujU4FzMiP
         CdCwQqdlTuB5aY99HtOyBl1jjGz2zo1bGxRwr1aGFE+pbyUN5FFlTntYxZ0AD5wbDP3S
         2hn6mDxzugFSJ9+LbDaa8lS4g6RENzaxHTPt7B6paAkbSFSZsBtY9oASno7RvI3ctlfz
         LCb1gAnhvL5p2H5vpNeIOr9eTQIOqhh7+kmmq7tMky2/aB95az+AaU/dKJQoUTsIY7qT
         gOTZAdHc4+fZHD2CeP9bX8MTUXy1Yc3fNy7ETbYqoe0+Gl2ke8m9nw84rHTmidoygKrH
         CwoQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20161025 header.b=YTP6W3WD;
       spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180
.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
        by mx.google.com with ESMTP id s9si2120330pgn.281.2018.02.21.01.07.26
        for <dja@axtens.net>;
        Wed, 21 Feb 2018 01:07:26 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.
180.67 as permitted sender) client-ip=209.132.180.67;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20161025 header.b=YTP6W3WD;
       spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180
.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753028AbeBUJHV (ORCPT <rfc822;dja@axtens.net>);
        Wed, 21 Feb 2018 04:07:21 -0500
Received: from mail-wr0-f196.google.com ([209.85.128.196]:34728 "EHLO
        mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752409AbeBUJHR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Feb 2018 04:07:17 -0500
Received: by mail-wr0-f196.google.com with SMTP id m5so2297944wrg.1;
        Wed, 21 Feb 2018 01:07:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=from:to:subject:date:message-id:mime-version
         :content-transfer-encoding;
        bh=CFYy3bhn7eoWPF6AJMH1sMf+K6u6usD+1M2NOg/UHAw=;
        b=YTP6W3WDHDARqaf+T4aDY1Z5IM4p2NlgLAJEnVsmJMIqE6PIL/1gIU2w7znKP2BZ2X
         7xNMoPQrYm8bHmWr5ZNG5d2ql+zQOCnFE8XKIYCUCz4rYb0umccrvDcSgrGs27/UvIw3
         8Tc0WFplVgoOsHMW+k7YF+a3an50Q/gDWwjT/9h/lvzRRZ5H2t5NAaKuuikgQoGSi6Su
         /FQBXhY8qT6E5TeC2r6XnuyQy73YV1DsUMZ8Ehjy/6tbJhUMsveshHC8/tNh9TbxhWoK
         fVlGEDVNmR2MHAkDnZFkACakllfB9sE3FlkwEQ0hPDUsUKEgS5RXoNlfc03Y3W7RNFpx
         XXIw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:subject:date:message-id:mime-version
         :content-transfer-encoding;
        bh=CFYy3bhn7eoWPF6AJMH1sMf+K6u6usD+1M2NOg/UHAw=;
        b=ORPK99ZlmTZ3ktjBaeVk3H3JZ/9sR4C8ohoUBdTnadoY6qa/wIhusqOXjEJo+RBS7D
         YZ6zbHR0iFM2ErnSnEgBUWQMgjsi3uSzTrHz+Fz0JDW02P9/noCHfXfl95yMRR5xLlS4
         GeAIZKjHHYNmKK44oYrUYHfhpokK99afmJ0zIlbDjBbuyjsCgGih4s4+/n1IcGQQ91er
         pc9BUG/AwcAfmRkm6dryBzc3Qk4UA0EulG2iVjZmMlqO0GL1V4w0RjTx37Ym9+gAmMlq
         oKbLQq/ELtj252qNkRw1bxoE5PLBpfszmAv0kREKzvN7+9LNjYCac3h97xIz7AqQKdFW
         PiEg==
X-Gm-Message-State: APf1xPDny4m9NzXZAkwJRYoao3cmAcW+tMOYiCFlZSovJ31ZZJ21t0qk
        lHrju+PIF9J/hpFO6iSeSeE=
X-Received: by 10.28.167.208 with SMTP id q199mr1530756wme.29.1519204036496;
        Wed, 21 Feb 2018 01:07:16 -0800 (PST)
Received: from baker.fritz.box ([2a02:908:1251:8fc0:d050:cb11:74d7:d8f6])
        by smtp.gmail.com with ESMTPSA id x189sm1661875wmg.23.2018.02.21.01.07.15
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 21 Feb 2018 01:07:16 -0800 (PST)
From:   "=?UTF-8?q?Christian=20K=C3=B6nig?=" 
        <ckoenig.leichtzumerken@gmail.com>
X-Google-Original-From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
To:     bhelgaas@google.com, linux-kernel@vger.kernel.org,
        linux-pci@vger.kernel.org
Subject: [PATCH] PCI: stop crashing in pci_release_resource v2
Date:   Wed, 21 Feb 2018 10:07:15 +0100
Message-Id: <20180221090715.2853-1-christian.koenig@amd.com>
X-Mailer: git-send-email 2.14.1
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-TUID: AvE/m/C3rDgS

>
> Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
> ---
>  patchwork/views/utils.py | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/patchwork/views/utils.py b/patchwork/views/utils.py
> index 84682b8..5c28308 100644
> --- a/patchwork/views/utils.py
> +++ b/patchwork/views/utils.py
> @@ -89,21 +89,17 @@ def _submission_to_mbox(submission):
>      utc_timestamp = delta.seconds + delta.days * 24 * 3600
>  
>      mail = PatchMbox(body)
> -    mail['Subject'] = submission.name
>      mail['X-Patchwork-Submitter'] = email.utils.formataddr((
>          str(Header(submission.submitter.name, mail.patch_charset)),
>          submission.submitter.email))
>      mail['X-Patchwork-Id'] = str(submission.id)
>      if is_patch and submission.delegate:
>          mail['X-Patchwork-Delegate'] = str(submission.delegate.email)
> -    mail['Message-Id'] = submission.msgid
>      mail.set_unixfrom('From patchwork ' + submission.date.ctime())
>  
> -    copied_headers = ['To', 'Cc', 'Date', 'From', 'List-Id']
>      orig_headers = HeaderParser().parsestr(str(submission.headers))
> -    for header in copied_headers:
> -        if header in orig_headers:
> -            mail[header] = orig_headers[header]
> +    for header in orig_headers.keys():
> +        mail[header] = orig_headers[header]
>  
>      if 'Date' not in mail:
>          mail['Date'] = email.utils.formatdate(utc_timestamp)
> -- 
> 2.13.6
>
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Johannes Berg April 5, 2018, 11 a.m. UTC | #2
On Thu, 2018-04-05 at 19:58 +1000, Daniel Axtens wrote:
> vkabatov@redhat.com writes:
> 
> > From: Veronika Kabatova <vkabatov@redhat.com>
> > 
> > Solves issue #165 (Exported mboxes should include In-Reply-To,
> > References, etc headers). Instead of including only a few chosen ones,
> > all received headers are added to mboxes.
> 
> Thanks for the patch.
> 
> I'm a little worried that this will get really messy - I've included a
> snippet of headers from an email from an unrelated bug below. Maybe we
> don't care - I guess this isn't really for human consumption but is for
> consumption by e.g. git-am.
> 
> Alternatively we can blacklist headers: I don't think there's anything
> worth having in Received, X-*, List-*, DKIM, ARC, SPF, etc. But I wonder
> if this is just whack-a-mole in reverse.
> 
> Thoughts?

I'm not really sure human consumption would be a worry for mbox files?
Having the full headers - since it's sort of an email archive already -
would be useful though.

In particular, the change to keep the original Subject would be useful
for us, as we have a script that automatically replies to the email
saying "thank you, I've applied your patch" or similar.

That said, using a dict for this is in general not quite right, since
many header lines are valid multiple times, e.g. "Received:", but I'm
not sure they're even all stored.

johannes
Veronika Kabatova April 5, 2018, 11:11 a.m. UTC | #3
----- Original Message -----
> From: "Daniel Axtens" <dja@axtens.net>
> To: vkabatov@redhat.com, patchwork@lists.ozlabs.org
> Sent: Thursday, April 5, 2018 11:58:15 AM
> Subject: Re: [PATCH] Include all email headers in mboxes
> 
> vkabatov@redhat.com writes:
> 
> > From: Veronika Kabatova <vkabatov@redhat.com>
> >
> > Solves issue #165 (Exported mboxes should include In-Reply-To,
> > References, etc headers). Instead of including only a few chosen ones,
> > all received headers are added to mboxes.
> 
> Thanks for the patch.
> 
> I'm a little worried that this will get really messy - I've included a
> snippet of headers from an email from an unrelated bug below. Maybe we
> don't care - I guess this isn't really for human consumption but is for
> consumption by e.g. git-am.
> 
> Alternatively we can blacklist headers: I don't think there's anything
> worth having in Received, X-*, List-*, DKIM, ARC, SPF, etc. But I wonder
> if this is just whack-a-mole in reverse.
> 
> Thoughts?
> 

Hi,

I've thought about ignoring eg headers starting with X-* but ultimately
decided I don't want to exclude them. Maybe people use custom headers for
patch apply hooks or similar craziness, or have CI / tools that download
the mbox and require a specific header set (for example reply to the author
and mailing list if the patch is broken unless X-ignore header is set) etc.
Whatever subset we decide to include, there might be someone who needs one
of the headers that wasn't added. Granted, you can extract the headers from
the API itself but we can say the same about extracting the patch from
there since git am eats input from stdin too.

From my experience, blacklisting never works. Things evolve and new "messy"
headers will be added, or people will just use a different nonstandard
header name if they need custom headers and so on.

As you said, I doubt anyone actually needs to read the mbox files.
Everything included there is already visible in WebUI or API. Also,
skipping the header section / searching the text should work if somebody
really needs to consume raw mboxes without tools.


Veronika

> Regards,
> Daniel
> 
> 
> Delivered-To: dja@axtens.net
> Received: by 10.74.139.216 with SMTP id p24csp390579ooj;
>         Wed, 21 Feb 2018 01:07:26 -0800 (PST)
> X-Google-Smtp-Source:
> AH8x227WmScjPDkd1bgpNsd0DYg4ae/OzHPrTT68ezy+A7p6CMpHdsmU6pB6dZluayRkf9LDxR5l
> X-Received: by 10.99.55.1 with SMTP id e1mr2155932pga.237.1519204046116;
>         Wed, 21 Feb 2018 01:07:26 -0800 (PST)
> ARC-Seal: i=1; a=rsa-sha256; t=1519204046; cv=none;
>         d=google.com; s=arc-20160816;
>         b=PXNfl2kupZqmeFoDWy2tLz6100KxBfLj/Rmu4/OTqiLwcPVNIsGBGE2FN/e/Gj2LCu
>          NDdGSTuaDdYxI0wlCEqBZBrVatoda/PHBX2Fh2HZ2rzEahAs5R2w+YucxRTAi8UT7MTo
>          xxxHqQ1DehoZmG9J3l0Ckh1cXmAplnmLB7+MufnD7FT7ajNufPhGiZ2ovDCcqb0/6KxG
>          8cfsjAnNAZAF3M8VF8r2KSubGuplPq4sB7aF9eklL0HmnXNVLYkyiwZbJuI/vVh6yBkc
>          rKSdP02SlazOIL8FJuJQC5reac17qfbyjg/tbxyCC+/Lmc/DwexMfA4NPhEgqPZgwFWr
>          9ILQ==
> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
> s=arc-20160816;
>         h=list-id:precedence:sender:content-transfer-encoding:mime-version
>          :message-id:date:subject:to:from:dkim-signature
>          :arc-authentication-results;
>         bh=CFYy3bhn7eoWPF6AJMH1sMf+K6u6usD+1M2NOg/UHAw=;
>         b=VtdyXugEOzZm0I0t+YzlUoOWX+xI4UC50ZF9w5gd+zlNTsJMDGtykiwjZujU4FzMiP
>          CdCwQqdlTuB5aY99HtOyBl1jjGz2zo1bGxRwr1aGFE+pbyUN5FFlTntYxZ0AD5wbDP3S
>          2hn6mDxzugFSJ9+LbDaa8lS4g6RENzaxHTPt7B6paAkbSFSZsBtY9oASno7RvI3ctlfz
>          LCb1gAnhvL5p2H5vpNeIOr9eTQIOqhh7+kmmq7tMky2/aB95az+AaU/dKJQoUTsIY7qT
>          gOTZAdHc4+fZHD2CeP9bX8MTUXy1Yc3fNy7ETbYqoe0+Gl2ke8m9nw84rHTmidoygKrH
>          CwoQ==
> ARC-Authentication-Results: i=1; mx.google.com;
>        dkim=pass header.i=@gmail.com header.s=20161025 header.b=YTP6W3WD;
>        spf=pass (google.com: best guess record for domain of
>        linux-kernel-owner@vger.kernel.org designates 209.132.180
> .67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
>        dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Return-Path: <linux-kernel-owner@vger.kernel.org>
> Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
>         by mx.google.com with ESMTP id s9si2120330pgn.281.2018.02.21.01.07.26
>         for <dja@axtens.net>;
>         Wed, 21 Feb 2018 01:07:26 -0800 (PST)
> Received-SPF: pass (google.com: best guess record for domain of
> linux-kernel-owner@vger.kernel.org designates 209.132.
> 180.67 as permitted sender) client-ip=209.132.180.67;
> Authentication-Results: mx.google.com;
>        dkim=pass header.i=@gmail.com header.s=20161025 header.b=YTP6W3WD;
>        spf=pass (google.com: best guess record for domain of
>        linux-kernel-owner@vger.kernel.org designates 209.132.180
> .67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
>        dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
>         id S1753028AbeBUJHV (ORCPT <rfc822;dja@axtens.net>);
>         Wed, 21 Feb 2018 04:07:21 -0500
> Received: from mail-wr0-f196.google.com ([209.85.128.196]:34728 "EHLO
>         mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
>         with ESMTP id S1752409AbeBUJHR (ORCPT
>         <rfc822;linux-kernel@vger.kernel.org>);
>         Wed, 21 Feb 2018 04:07:17 -0500
> Received: by mail-wr0-f196.google.com with SMTP id m5so2297944wrg.1;
>         Wed, 21 Feb 2018 01:07:17 -0800 (PST)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>         d=gmail.com; s=20161025;
>         h=from:to:subject:date:message-id:mime-version
>          :content-transfer-encoding;
>         bh=CFYy3bhn7eoWPF6AJMH1sMf+K6u6usD+1M2NOg/UHAw=;
>         b=YTP6W3WDHDARqaf+T4aDY1Z5IM4p2NlgLAJEnVsmJMIqE6PIL/1gIU2w7znKP2BZ2X
>          7xNMoPQrYm8bHmWr5ZNG5d2ql+zQOCnFE8XKIYCUCz4rYb0umccrvDcSgrGs27/UvIw3
>          8Tc0WFplVgoOsHMW+k7YF+a3an50Q/gDWwjT/9h/lvzRRZ5H2t5NAaKuuikgQoGSi6Su
>          /FQBXhY8qT6E5TeC2r6XnuyQy73YV1DsUMZ8Ehjy/6tbJhUMsveshHC8/tNh9TbxhWoK
>          fVlGEDVNmR2MHAkDnZFkACakllfB9sE3FlkwEQ0hPDUsUKEgS5RXoNlfc03Y3W7RNFpx
>          XXIw==
> X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>         d=1e100.net; s=20161025;
>         h=x-gm-message-state:from:to:subject:date:message-id:mime-version
>          :content-transfer-encoding;
>         bh=CFYy3bhn7eoWPF6AJMH1sMf+K6u6usD+1M2NOg/UHAw=;
>         b=ORPK99ZlmTZ3ktjBaeVk3H3JZ/9sR4C8ohoUBdTnadoY6qa/wIhusqOXjEJo+RBS7D
>          YZ6zbHR0iFM2ErnSnEgBUWQMgjsi3uSzTrHz+Fz0JDW02P9/noCHfXfl95yMRR5xLlS4
>          GeAIZKjHHYNmKK44oYrUYHfhpokK99afmJ0zIlbDjBbuyjsCgGih4s4+/n1IcGQQ91er
>          pc9BUG/AwcAfmRkm6dryBzc3Qk4UA0EulG2iVjZmMlqO0GL1V4w0RjTx37Ym9+gAmMlq
>          oKbLQq/ELtj252qNkRw1bxoE5PLBpfszmAv0kREKzvN7+9LNjYCac3h97xIz7AqQKdFW
>          PiEg==
> X-Gm-Message-State: APf1xPDny4m9NzXZAkwJRYoao3cmAcW+tMOYiCFlZSovJ31ZZJ21t0qk
>         lHrju+PIF9J/hpFO6iSeSeE=
> X-Received: by 10.28.167.208 with SMTP id q199mr1530756wme.29.1519204036496;
>         Wed, 21 Feb 2018 01:07:16 -0800 (PST)
> Received: from baker.fritz.box ([2a02:908:1251:8fc0:d050:cb11:74d7:d8f6])
>         by smtp.gmail.com with ESMTPSA id
>         x189sm1661875wmg.23.2018.02.21.01.07.15
>         (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
>         Wed, 21 Feb 2018 01:07:16 -0800 (PST)
> From:   "=?UTF-8?q?Christian=20K=C3=B6nig?="
>         <ckoenig.leichtzumerken@gmail.com>
> X-Google-Original-From: =?UTF-8?q?Christian=20K=C3=B6nig?=
> <christian.koenig@amd.com>
> To:     bhelgaas@google.com, linux-kernel@vger.kernel.org,
>         linux-pci@vger.kernel.org
> Subject: [PATCH] PCI: stop crashing in pci_release_resource v2
> Date:   Wed, 21 Feb 2018 10:07:15 +0100
> Message-Id: <20180221090715.2853-1-christian.koenig@amd.com>
> X-Mailer: git-send-email 2.14.1
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> Sender: linux-kernel-owner@vger.kernel.org
> Precedence: bulk
> List-ID: <linux-kernel.vger.kernel.org>
> X-Mailing-List: linux-kernel@vger.kernel.org
> X-TUID: AvE/m/C3rDgS
> 
> >
> > Signed-off-by: Veronika Kabatova <vkabatov@redhat.com>
> > ---
> >  patchwork/views/utils.py | 8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/patchwork/views/utils.py b/patchwork/views/utils.py
> > index 84682b8..5c28308 100644
> > --- a/patchwork/views/utils.py
> > +++ b/patchwork/views/utils.py
> > @@ -89,21 +89,17 @@ def _submission_to_mbox(submission):
> >      utc_timestamp = delta.seconds + delta.days * 24 * 3600
> >  
> >      mail = PatchMbox(body)
> > -    mail['Subject'] = submission.name
> >      mail['X-Patchwork-Submitter'] = email.utils.formataddr((
> >          str(Header(submission.submitter.name, mail.patch_charset)),
> >          submission.submitter.email))
> >      mail['X-Patchwork-Id'] = str(submission.id)
> >      if is_patch and submission.delegate:
> >          mail['X-Patchwork-Delegate'] = str(submission.delegate.email)
> > -    mail['Message-Id'] = submission.msgid
> >      mail.set_unixfrom('From patchwork ' + submission.date.ctime())
> >  
> > -    copied_headers = ['To', 'Cc', 'Date', 'From', 'List-Id']
> >      orig_headers = HeaderParser().parsestr(str(submission.headers))
> > -    for header in copied_headers:
> > -        if header in orig_headers:
> > -            mail[header] = orig_headers[header]
> > +    for header in orig_headers.keys():
> > +        mail[header] = orig_headers[header]
> >  
> >      if 'Date' not in mail:
> >          mail['Date'] = email.utils.formatdate(utc_timestamp)
> > --
> > 2.13.6
> >
> > _______________________________________________
> > Patchwork mailing list
> > Patchwork@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/patchwork
>
Daniel Axtens April 5, 2018, 1:47 p.m. UTC | #4
Johannes Berg <johannes@sipsolutions.net> writes:

> On Thu, 2018-04-05 at 19:58 +1000, Daniel Axtens wrote:
>> vkabatov@redhat.com writes:
>> 
>> > From: Veronika Kabatova <vkabatov@redhat.com>
>> > 
>> > Solves issue #165 (Exported mboxes should include In-Reply-To,
>> > References, etc headers). Instead of including only a few chosen ones,
>> > all received headers are added to mboxes.
>> 
>> Thanks for the patch.
>> 
>> I'm a little worried that this will get really messy - I've included a
>> snippet of headers from an email from an unrelated bug below. Maybe we
>> don't care - I guess this isn't really for human consumption but is for
>> consumption by e.g. git-am.
>> 
>> Alternatively we can blacklist headers: I don't think there's anything
>> worth having in Received, X-*, List-*, DKIM, ARC, SPF, etc. But I wonder
>> if this is just whack-a-mole in reverse.
>> 
>> Thoughts?
>
> I'm not really sure human consumption would be a worry for mbox files?
> Having the full headers - since it's sort of an email archive already -
> would be useful though.
>
> In particular, the change to keep the original Subject would be useful
> for us, as we have a script that automatically replies to the email
> saying "thank you, I've applied your patch" or similar.

Hmm, I think the hasher.py and patchwork-update-commits scripts were
designed to facilitate this sort of thing in a slightly more robust
way. But I've never really felt like I understood it, and I don't recall
any docs.

I hope that the change to the subject mangling doesn't break anything
else, but I can't imagine it would. [0]

> That said, using a dict for this is in general not quite right, since
> many header lines are valid multiple times, e.g. "Received:", but I'm
> not sure they're even all stored.

Right, you and Veronika have convinced me. Clearly as a developer of
patchwork my view is a bit skewed - I have to look at them fairly
frequently when people report parsing bugs so I forget that other people
don't do this.

Veronika - can you check on repeated headers issue that Johannes raises?

Thanks all.

Regards,
Daniel

[0] Obligatory xkcd - https://xkcd.com/1172/
>
> johannes
Veronika Kabatova April 5, 2018, 2:58 p.m. UTC | #5
----- Original Message -----
> From: "Daniel Axtens" <dja@axtens.net>
> To: "Johannes Berg" <johannes@sipsolutions.net>, vkabatov@redhat.com, patchwork@lists.ozlabs.org
> Sent: Thursday, April 5, 2018 3:47:03 PM
> Subject: Re: [PATCH] Include all email headers in mboxes
> 
> Johannes Berg <johannes@sipsolutions.net> writes:
> 
> > On Thu, 2018-04-05 at 19:58 +1000, Daniel Axtens wrote:
> >> vkabatov@redhat.com writes:
> >> 
> >> > From: Veronika Kabatova <vkabatov@redhat.com>
> >> > 
> >> > Solves issue #165 (Exported mboxes should include In-Reply-To,
> >> > References, etc headers). Instead of including only a few chosen ones,
> >> > all received headers are added to mboxes.
> >> 
> >> Thanks for the patch.
> >> 
> >> I'm a little worried that this will get really messy - I've included a
> >> snippet of headers from an email from an unrelated bug below. Maybe we
> >> don't care - I guess this isn't really for human consumption but is for
> >> consumption by e.g. git-am.
> >> 
> >> Alternatively we can blacklist headers: I don't think there's anything
> >> worth having in Received, X-*, List-*, DKIM, ARC, SPF, etc. But I wonder
> >> if this is just whack-a-mole in reverse.
> >> 
> >> Thoughts?
> >
> > I'm not really sure human consumption would be a worry for mbox files?
> > Having the full headers - since it's sort of an email archive already -
> > would be useful though.
> >
> > In particular, the change to keep the original Subject would be useful
> > for us, as we have a script that automatically replies to the email
> > saying "thank you, I've applied your patch" or similar.
> 
> Hmm, I think the hasher.py and patchwork-update-commits scripts were
> designed to facilitate this sort of thing in a slightly more robust
> way. But I've never really felt like I understood it, and I don't recall
> any docs.
> 
> I hope that the change to the subject mangling doesn't break anything
> else, but I can't imagine it would. [0]
> 
> > That said, using a dict for this is in general not quite right, since
> > many header lines are valid multiple times, e.g. "Received:", but I'm
> > not sure they're even all stored.
> 
> Right, you and Veronika have convinced me. Clearly as a developer of
> patchwork my view is a bit skewed - I have to look at them fairly
> frequently when people report parsing bugs so I forget that other people
> don't do this.
> 
> Veronika - can you check on repeated headers issue that Johannes raises?
> 

Right, this won't be too friendly in case of duplicate header keys. We do
store all headers with duplicate keys so I can easily change that and post
updated patch shortly.

However, since this was brought up, the same situation is present in
the API -- only one of the duplicate headers is shown. Based on my testing,
the code on our side does return correct values so this should be a problem
with the REST framework or something similar. I might take a look at it
later but can't promise I'll be able to fix it easily.


Thanks,
Veronika


> Thanks all.
> 
> Regards,
> Daniel
> 
> [0] Obligatory xkcd - https://xkcd.com/1172/
> >
> > johannes
>
Daniel Axtens April 5, 2018, 3:47 p.m. UTC | #6
>> > That said, using a dict for this is in general not quite right, since
>> > many header lines are valid multiple times, e.g. "Received:", but I'm
>> > not sure they're even all stored.
>> 
>> Right, you and Veronika have convinced me. Clearly as a developer of
>> patchwork my view is a bit skewed - I have to look at them fairly
>> frequently when people report parsing bugs so I forget that other people
>> don't do this.
>> 
>> Veronika - can you check on repeated headers issue that Johannes raises?
>> 
>
> Right, this won't be too friendly in case of duplicate header keys. We do
> store all headers with duplicate keys so I can easily change that and post
> updated patch shortly.
>
> However, since this was brought up, the same situation is present in
> the API -- only one of the duplicate headers is shown. Based on my testing,
> the code on our side does return correct values so this should be a problem
> with the REST framework or something similar. I might take a look at it
> later but can't promise I'll be able to fix it easily.

Thanks! Fixing one problem at a time is fine - feel free to leave the
API for later. I think it'll need a special serializer so we may need
Stephen's expertise.

Regards,
Daniel
diff mbox series

Patch

diff --git a/patchwork/views/utils.py b/patchwork/views/utils.py
index 84682b8..5c28308 100644
--- a/patchwork/views/utils.py
+++ b/patchwork/views/utils.py
@@ -89,21 +89,17 @@  def _submission_to_mbox(submission):
     utc_timestamp = delta.seconds + delta.days * 24 * 3600
 
     mail = PatchMbox(body)
-    mail['Subject'] = submission.name
     mail['X-Patchwork-Submitter'] = email.utils.formataddr((
         str(Header(submission.submitter.name, mail.patch_charset)),
         submission.submitter.email))
     mail['X-Patchwork-Id'] = str(submission.id)
     if is_patch and submission.delegate:
         mail['X-Patchwork-Delegate'] = str(submission.delegate.email)
-    mail['Message-Id'] = submission.msgid
     mail.set_unixfrom('From patchwork ' + submission.date.ctime())
 
-    copied_headers = ['To', 'Cc', 'Date', 'From', 'List-Id']
     orig_headers = HeaderParser().parsestr(str(submission.headers))
-    for header in copied_headers:
-        if header in orig_headers:
-            mail[header] = orig_headers[header]
+    for header in orig_headers.keys():
+        mail[header] = orig_headers[header]
 
     if 'Date' not in mail:
         mail['Date'] = email.utils.formatdate(utc_timestamp)