diff mbox series

[U-Boot,v2] patman: Use the Change-Id, version, and prefix in the Message-Id

Message ID 20190903131419.v2.1.Ie6289f437ae533d7fcaddfcee9202f0e92c6b2b9@changeid
State Changes Requested
Delegated to: Simon Glass
Headers show
Series [U-Boot,v2] patman: Use the Change-Id, version, and prefix in the Message-Id | expand

Commit Message

Doug Anderson Sept. 3, 2019, 8:15 p.m. UTC
As per the centithread on ksummit-discuss [1], there are folks who
feel that if a Change-Id is present in a developer's local commit that
said Change-Id could be interesting to include in upstream posts.
Specifically if two commits are posted with the same Change-Id there's
a reasonable chance that they are either the same commit or a newer
version of the same commit.  Specifically this is because that's how
gerrit has trained people to work.

There is much angst about Change-Id in upstream Linux, but one thing
that seems safe and non-controversial is to include the Change-Id as
part of the string of crud that makes up a Message-Id.

Let's give that a try.

In theory (if there is enough adoption) this could help a tool more
reliably find various versions of a commit.  This actually might work
pretty well for U-Boot where (I believe) quite a number of developers
use patman, so there could be critical mass (assuming that enough of
these people also use a git hook that adds Change-Id to their
commits).  I was able to find this git hook by searching for "gerrit
change id git hook" in my favorite search engine.

In theory one could imagine something like this could be integrated
into other tools, possibly even git-send-email.  Getting it into
patman seems like a sane first step, though.

NOTE: this patch is being posted using a patman containing this patch,
so you should be able to see the Message-Id of this patch and see that
it contains my local Change-Id, which ends in 2b9 if you want to
check.

[1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-August/006739.html

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v2:
- Add a "v" before the version part of the Message-Id
- Reorder the parts of the Message-Id as per Johannes.

 tools/patman/README         |  8 +++++-
 tools/patman/commit.py      |  3 ++
 tools/patman/patchstream.py | 57 +++++++++++++++++++++++++++++++++++--
 3 files changed, 65 insertions(+), 3 deletions(-)

Comments

Johannes Berg Sept. 5, 2019, 7:48 p.m. UTC | #1
On Tue, 2019-09-03 at 13:15 -0700, Douglas Anderson wrote:
> 
> Let's give that a try.
> 
> In theory (if there is enough adoption) this could help a tool more
> reliably find various versions of a commit.

It's not quite as good as this (yet), but a very simple version of it
for git-send-email could be this:

--- save as .git/hooks/sendemail-validate ---
#!/bin/sh

set -e

changeid=$(sed 's/^Change-Id: \(I.*\)$/\1/;t;d' $1)
date=$(date +%s)
sed 's/^Change-Id: I.*$//;T;d' -i $1
sed "s/^From: /Message-Id: <$date-$changeid@changeid>\nFrom: /" -i $1
#--- end script ---


It won't do the RFC/which patch of a series etc. but I consider that
less interesting.

Actually, the important part for me is to be to be able to have change-
ids locally (e.g. for working with gerrit), and not have that leak out
to the community (that doesn't like them).

As a maintainer in the community, I'll also need to change the
.git/hooks/commit-msg script that comes with gerrit to not add a Change-
Id if the commit comes with a Link: tag already, but that should be
easy.

johannes
Johannes Berg Sept. 5, 2019, 7:52 p.m. UTC | #2
On Thu, 2019-09-05 at 21:48 +0200, Johannes Berg wrote:
> 
> As a maintainer in the community, I'll also need to change the
> .git/hooks/commit-msg script that comes with gerrit to not add a Change-
> Id if the commit comes with a Link: tag already, but that should be
> easy.

This, I should mention, is for the case where I apply a patch from
somebody else (and while at it I use the script that adds a Link: tag to
lore.kernel.org or some other archive recording the message-id), but
then I modify the commit slightly and amend it. This would invoke the
commit-msg hook and add a Change-Id, but that's not desired here.

johannes
Simon Glass Sept. 12, 2019, 4:41 p.m. UTC | #3
On Tue, 3 Sep 2019 at 14:15, Douglas Anderson <dianders@chromium.org> wrote:
>
> As per the centithread on ksummit-discuss [1], there are folks who
> feel that if a Change-Id is present in a developer's local commit that
> said Change-Id could be interesting to include in upstream posts.
> Specifically if two commits are posted with the same Change-Id there's
> a reasonable chance that they are either the same commit or a newer
> version of the same commit.  Specifically this is because that's how
> gerrit has trained people to work.
>
> There is much angst about Change-Id in upstream Linux, but one thing
> that seems safe and non-controversial is to include the Change-Id as
> part of the string of crud that makes up a Message-Id.
>
> Let's give that a try.
>
> In theory (if there is enough adoption) this could help a tool more
> reliably find various versions of a commit.  This actually might work
> pretty well for U-Boot where (I believe) quite a number of developers
> use patman, so there could be critical mass (assuming that enough of
> these people also use a git hook that adds Change-Id to their
> commits).  I was able to find this git hook by searching for "gerrit
> change id git hook" in my favorite search engine.
>
> In theory one could imagine something like this could be integrated
> into other tools, possibly even git-send-email.  Getting it into
> patman seems like a sane first step, though.
>
> NOTE: this patch is being posted using a patman containing this patch,
> so you should be able to see the Message-Id of this patch and see that
> it contains my local Change-Id, which ends in 2b9 if you want to
> check.
>
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-August/006739.html
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v2:
> - Add a "v" before the version part of the Message-Id
> - Reorder the parts of the Message-Id as per Johannes.
>
>  tools/patman/README         |  8 +++++-
>  tools/patman/commit.py      |  3 ++
>  tools/patman/patchstream.py | 57 +++++++++++++++++++++++++++++++++++--
>  3 files changed, 65 insertions(+), 3 deletions(-)

Reviewed-by: Simon Glass <sjg@chromium.org>
Doug Anderson Sept. 16, 2019, 6:09 p.m. UTC | #4
Johannes,

On Thu, Sep 5, 2019 at 12:48 PM Johannes Berg <johannes@sipsolutions.net> wrote:
>
> On Tue, 2019-09-03 at 13:15 -0700, Douglas Anderson wrote:
> >
> > Let's give that a try.
> >
> > In theory (if there is enough adoption) this could help a tool more
> > reliably find various versions of a commit.
>
> It's not quite as good as this (yet), but a very simple version of it
> for git-send-email could be this:
>
> --- save as .git/hooks/sendemail-validate ---
> #!/bin/sh
>
> set -e
>
> changeid=$(sed 's/^Change-Id: \(I.*\)$/\1/;t;d' $1)
> date=$(date +%s)
> sed 's/^Change-Id: I.*$//;T;d' -i $1
> sed "s/^From: /Message-Id: <$date-$changeid@changeid>\nFrom: /" -i $1
> #--- end script ---
>
>
> It won't do the RFC/which patch of a series etc. but I consider that
> less interesting.
>
> Actually, the important part for me is to be to be able to have change-
> ids locally (e.g. for working with gerrit), and not have that leak out
> to the community (that doesn't like them).
>
> As a maintainer in the community, I'll also need to change the
> .git/hooks/commit-msg script that comes with gerrit to not add a Change-
> Id if the commit comes with a Link: tag already, but that should be
> easy.

Do you have any idea how to encourage adoption of your git hook?
Would it make sense to do something like check it in to the Linux
kernel somewhere?

-Doug
Johannes Berg Sept. 16, 2019, 7:02 p.m. UTC | #5
Hi Doug,

> > Actually, the important part for me is to be to be able to have change-
> > ids locally (e.g. for working with gerrit), and not have that leak out
> > to the community (that doesn't like them).
> > 
> > As a maintainer in the community, I'll also need to change the
> > .git/hooks/commit-msg script that comes with gerrit to not add a Change-
> > Id if the commit comes with a Link: tag already, but that should be
> > easy.
> 
> Do you have any idea how to encourage adoption of your git hook?
> Would it make sense to do something like check it in to the Linux
> kernel somewhere?

Not sure, that might backfire completely into getting modified to be a
hook that simply *strips* it, rather than preserving it in the message
ID?

I suspect one thing that might help would be to advertise such a hook
widely where the users are most likely to be, with gerrit? I for one
mostly added this because I need to use gerrit sometimes, and

 (1) stripping the change IDs manually is annoying, and
 (2) stripping the change IDs means I can't actually track my own
     patches

Also, I did in the end modify my commit-msg hook to abort if it finds a
Link:


@@ ... @@
	if grep -i '^Change-Id:' "$MSG" >/dev/null
	then
		return
	fi

+	if grep -i '^Link:' "$MSG" >/dev/null
+	then
+		return
+	fi

	id=`_gen_ChangeId`


and have another hook:

$ cat .git/hooks/applypatch-msg 
#!/bin/sh
. git-sh-setup
# the sfid stuff is sometimes added by my spam filter ...
perl -pi -e 's|^Message-Id:\s*<?([^>]+)>?(\s*\(sfid-.*)?$|Link: 
https://lore.kernel.org/r/$1|g;' "$1"
test -x "$GIT_DIR/hooks/commit-msg" &&
        exec "$GIT_DIR/hooks/commit-msg" ${1+"$@"}
:
(EOF)

that adds the link to patches I pick up from others (since I'm also a
maintainer).

Combined, this means I no longer have to worry about change IDs getting
upstream into my pull requests (that I send out as a kernel maintainer,
if I edit patches I've applied) while I still create them for things I
author locally. This relies on the fact that I even send out my own
patches to the list, and then apply them from there ... at least
usually, to give others any idea.


Anyway, what's the point of all that discussion? I *mostly* did this out
of necessity, to avoid having to manually edit out the change-IDs when I
send patches or when I push code out to kernel.org after editing patches
I've applied from others, but I can still create them locally by default
for the cases where I have to use gerrit.

Not sure if this is the case for many people though, I could just have
used different kernel trees for all those different roles, it's just
harder for me to manage.

Ultimately, to drive adoption, I see two ways

 (1) convince people (other than Linus etc.) that commit tags are
     useful, and those who are convinced can use this stuff to maintain
     it across the kernel workflow (assuming the maintainer uses a
     script such as the apply-msg above, but that seems to be picking up
     adoption)
 (2) show it to those who have a need to work with gerrit or similar
     like me, for whom it covers not sending out the frowned-upon
     change-IDs to the Linux community.

For (1), I guess this should also come with the right commit-msg hook
and the sendemail-validate, since they don't already have the commit-msg 
hook from gerrit.

johannes
Simon Glass Sept. 27, 2019, 1:48 a.m. UTC | #6
Hi Doug,

On Tue, 3 Sep 2019 at 13:15, Douglas Anderson <dianders@chromium.org> wrote:
>
> As per the centithread on ksummit-discuss [1], there are folks who
> feel that if a Change-Id is present in a developer's local commit that
> said Change-Id could be interesting to include in upstream posts.
> Specifically if two commits are posted with the same Change-Id there's
> a reasonable chance that they are either the same commit or a newer
> version of the same commit.  Specifically this is because that's how
> gerrit has trained people to work.
>
> There is much angst about Change-Id in upstream Linux, but one thing
> that seems safe and non-controversial is to include the Change-Id as
> part of the string of crud that makes up a Message-Id.
>
> Let's give that a try.
>
> In theory (if there is enough adoption) this could help a tool more
> reliably find various versions of a commit.  This actually might work
> pretty well for U-Boot where (I believe) quite a number of developers
> use patman, so there could be critical mass (assuming that enough of
> these people also use a git hook that adds Change-Id to their
> commits).  I was able to find this git hook by searching for "gerrit
> change id git hook" in my favorite search engine.
>
> In theory one could imagine something like this could be integrated
> into other tools, possibly even git-send-email.  Getting it into
> patman seems like a sane first step, though.
>
> NOTE: this patch is being posted using a patman containing this patch,
> so you should be able to see the Message-Id of this patch and see that
> it contains my local Change-Id, which ends in 2b9 if you want to
> check.
>
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-August/006739.html
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v2:
> - Add a "v" before the version part of the Message-Id
> - Reorder the parts of the Message-Id as per Johannes.
>
>  tools/patman/README         |  8 +++++-
>  tools/patman/commit.py      |  3 ++
>  tools/patman/patchstream.py | 57 +++++++++++++++++++++++++++++++++++--
>  3 files changed, 65 insertions(+), 3 deletions(-)

Sadly this seems to cause a test failure (patman --test). Can you please check?

Regards,
SImon
Doug Anderson Sept. 27, 2019, 4:26 p.m. UTC | #7
Hi,

On Thu, Sep 26, 2019 at 6:50 PM Simon Glass <sjg@chromium.org> wrote:
>
> Hi Doug,
>
> On Tue, 3 Sep 2019 at 13:15, Douglas Anderson <dianders@chromium.org> wrote:
> >
> > As per the centithread on ksummit-discuss [1], there are folks who
> > feel that if a Change-Id is present in a developer's local commit that
> > said Change-Id could be interesting to include in upstream posts.
> > Specifically if two commits are posted with the same Change-Id there's
> > a reasonable chance that they are either the same commit or a newer
> > version of the same commit.  Specifically this is because that's how
> > gerrit has trained people to work.
> >
> > There is much angst about Change-Id in upstream Linux, but one thing
> > that seems safe and non-controversial is to include the Change-Id as
> > part of the string of crud that makes up a Message-Id.
> >
> > Let's give that a try.
> >
> > In theory (if there is enough adoption) this could help a tool more
> > reliably find various versions of a commit.  This actually might work
> > pretty well for U-Boot where (I believe) quite a number of developers
> > use patman, so there could be critical mass (assuming that enough of
> > these people also use a git hook that adds Change-Id to their
> > commits).  I was able to find this git hook by searching for "gerrit
> > change id git hook" in my favorite search engine.
> >
> > In theory one could imagine something like this could be integrated
> > into other tools, possibly even git-send-email.  Getting it into
> > patman seems like a sane first step, though.
> >
> > NOTE: this patch is being posted using a patman containing this patch,
> > so you should be able to see the Message-Id of this patch and see that
> > it contains my local Change-Id, which ends in 2b9 if you want to
> > check.
> >
> > [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-August/006739.html
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> > Changes in v2:
> > - Add a "v" before the version part of the Message-Id
> > - Reorder the parts of the Message-Id as per Johannes.
> >
> >  tools/patman/README         |  8 +++++-
> >  tools/patman/commit.py      |  3 ++
> >  tools/patman/patchstream.py | 57 +++++++++++++++++++++++++++++++++++--
> >  3 files changed, 65 insertions(+), 3 deletions(-)
>
> Sadly this seems to cause a test failure (patman --test). Can you please check?

It's been so long since I worked on patman that I totally forgot about
patman --test.  Doh!

I've posted v3 and tests pass now.  Hopefully it looks OK?

-Doug
Johannes Berg Oct. 11, 2019, 7:52 a.m. UTC | #8
Doug,

> > --- save as .git/hooks/sendemail-validate ---
> > #!/bin/sh
> > 
> > set -e
> > 
> > changeid=$(sed 's/^Change-Id: \(I.*\)$/\1/;t;d' $1)
> > date=$(date +%s)
> > sed 's/^Change-Id: I.*$//;T;d' -i $1
> > sed "s/^From: /Message-Id: <$date-$changeid@changeid>\nFrom: /" -i $1
> > #--- end script ---

> Do you have any idea how to encourage adoption of your git hook?
> Would it make sense to do something like check it in to the Linux
> kernel somewhere?

I still don't really know, but I wanted to post that I changed it now,
because of uniqueness problems if there *isn't* a changeid, and to make
the format a bit more like yours:

--- save as .git/hooks/sendemail-validate ---
#!/bin/bash

set -e

changeid=$(sed 's/^Change-Id: \(I.*\)$/\1/;t;d' $1)
# let git do its own if there's no change-id
[ "$changeid" = "" ] && exit 0

commit=$(sed 's/^From.*\([0-9a-fA-F]\{40\}\).*/\1/;t;d' $1)
# must have a commit ID - if this fails git changed
[ "$commit" == "" ] && exit 2

commit=${commit:0:12}

date=$(date +%Y%m%d%H%M%S)

# remove change-id
sed 's/^Change-Id: I.*$//;T;d' -i $1
# and add message-id
sed "s/^From: /Message-Id: <$date.$commit.$changeid@changeid>\nFrom: /" -i $1
#--- end script ---

(also at https://p.sipsolutions.net/a1ebdd33abb598b4.txt)

johannes
diff mbox series

Patch

diff --git a/tools/patman/README b/tools/patman/README
index 7917fc8bdc33..02d582974495 100644
--- a/tools/patman/README
+++ b/tools/patman/README
@@ -259,12 +259,18 @@  Series-process-log: sort, uniq
 	unique entries. If omitted, no change log processing is done.
 	Separate each tag with a comma.
 
+Change-Id:
+	This tag is stripped out but is used to generate the Message-Id
+	of the emails that will be sent. When you keep the Change-Id the
+	same you are asserting that this is a slightly different version
+	(but logically the same patch) as other patches that have been
+	sent out with the same Change-Id.
+
 Various other tags are silently removed, like these Chrome OS and
 Gerrit tags:
 
 BUG=...
 TEST=...
-Change-Id:
 Review URL:
 Reviewed-on:
 Commit-xxxx: (except Commit-notes)
diff --git a/tools/patman/commit.py b/tools/patman/commit.py
index 2bf3a0ba5b92..48d0529c5365 100644
--- a/tools/patman/commit.py
+++ b/tools/patman/commit.py
@@ -21,6 +21,8 @@  class Commit:
             The dict is indexed by change version (an integer)
         cc_list: List of people to aliases/emails to cc on this commit
         notes: List of lines in the commit (not series) notes
+        change_id: the Change-Id: tag that was stripped from this commit
+            and can be used to generate the Message-Id.
     """
     def __init__(self, hash):
         self.hash = hash
@@ -30,6 +32,7 @@  class Commit:
         self.cc_list = []
         self.signoff_set = set()
         self.notes = []
+        self.change_id = None
 
     def AddChange(self, version, info):
         """Add a new change line to the change list for a version.
diff --git a/tools/patman/patchstream.py b/tools/patman/patchstream.py
index b6455b0fa383..2dd6a778d4ff 100644
--- a/tools/patman/patchstream.py
+++ b/tools/patman/patchstream.py
@@ -2,6 +2,7 @@ 
 # Copyright (c) 2011 The Chromium OS Authors.
 #
 
+import datetime
 import math
 import os
 import re
@@ -14,7 +15,7 @@  import gitutil
 from series import Series
 
 # Tags that we detect and remove
-re_remove = re.compile('^BUG=|^TEST=|^BRANCH=|^Change-Id:|^Review URL:'
+re_remove = re.compile('^BUG=|^TEST=|^BRANCH=|^Review URL:'
     '|Reviewed-on:|Commit-\w*:')
 
 # Lines which are allowed after a TEST= line
@@ -32,6 +33,9 @@  re_cover_cc = re.compile('^Cover-letter-cc: *(.*)')
 # Patch series tag
 re_series_tag = re.compile('^Series-([a-z-]*): *(.*)')
 
+# Change-Id will be used to generate the Message-Id and then be stripped
+re_change_id = re.compile('^Change-Id: *(.*)')
+
 # Commit series tag
 re_commit_tag = re.compile('^Commit-([a-z-]*): *(.*)')
 
@@ -156,6 +160,7 @@  class PatchStream:
 
         # Handle state transition and skipping blank lines
         series_tag_match = re_series_tag.match(line)
+        change_id_match = re_change_id.match(line)
         commit_tag_match = re_commit_tag.match(line)
         cover_match = re_cover.match(line)
         cover_cc_match = re_cover_cc.match(line)
@@ -177,7 +182,7 @@  class PatchStream:
             self.state = STATE_MSG_HEADER
 
         # If a tag is detected, or a new commit starts
-        if series_tag_match or commit_tag_match or \
+        if series_tag_match or commit_tag_match or change_id_match or \
            cover_match or cover_cc_match or signoff_match or \
            self.state == STATE_MSG_HEADER:
             # but we are already in a section, this means 'END' is missing
@@ -275,6 +280,15 @@  class PatchStream:
                 self.AddToSeries(line, name, value)
                 self.skip_blank = True
 
+        # Detect Change-Id tags
+        elif change_id_match:
+            value = change_id_match.group(1)
+            if self.is_log:
+                if self.commit.change_id:
+                    raise ValueError("%s: Two Change-Ids: '%s' vs. '%s'" %
+                        (self.commit.hash, self.commit.change_id, value))
+                self.commit.change_id = value
+
         # Detect Commit-xxx tags
         elif commit_tag_match:
             name = commit_tag_match.group(1)
@@ -345,6 +359,41 @@  class PatchStream:
             self.warn.append('Found %d lines after TEST=' %
                     self.lines_after_test)
 
+    def WriteMessageId(self, outfd):
+        """Write the Message-Id into the output.
+
+        This is based on the Change-Id in the original patch, the version,
+        and the prefix.
+
+        Args:
+            outfd: Output stream file object
+        """
+        if not self.commit.change_id:
+            return
+
+        # In theory there is email.utils.make_msgid() which would be nice
+        # to use, but it already produces something way too long and thus
+        # will produce ugly commit lines if someone throws this into
+        # a "Link:" tag in the final commit.  So (sigh) roll our own.
+
+        # Start with the time; presumably we wouldn't send the same series
+        # with the same Change-Id at the exact same second.
+        parts = [datetime.datetime.now().strftime("%Y%m%d%H%M%S")]
+
+        # These seem like they would be nice to include.
+        if 'prefix' in self.series:
+            parts.append(self.series['prefix'])
+        if 'version' in self.series:
+            parts.append("v%s" % self.series['version'])
+
+        parts.append(str(self.commit.count + 1))
+
+        # The Change-Id must be last, right before the @
+        parts.append(self.commit.change_id)
+
+        # Join parts together with "." and write it out.
+        outfd.write('Message-Id: <%s@changeid>\n' % '.'.join(parts))
+
     def ProcessStream(self, infd, outfd):
         """Copy a stream from infd to outfd, filtering out unwanting things.
 
@@ -358,6 +407,9 @@  class PatchStream:
         fname = None
         last_fname = None
         re_fname = re.compile('diff --git a/(.*) b/.*')
+
+        self.WriteMessageId(outfd)
+
         while True:
             line = infd.readline()
             if not line:
@@ -481,6 +533,7 @@  def FixPatches(series, fnames):
     for fname in fnames:
         commit = series.commits[count]
         commit.patch = fname
+        commit.count = count
         result = FixPatch(backup_dir, fname, series, commit)
         if result:
             print('%d warnings for %s:' % (len(result), fname))